The standard C functions for copying and filling are
memset(). They're part of the standard library, are easy
to use and are often implemented with some optimizations so that they're
usually faster than manual looping. The DKA version, for example will fill as
words if the alignments and sizes allow for it. This can be much
faster than doing the loops yourself.
There is, however, one small annoying fact about these two: they're not VRAM-safe. If the alignment and size aren't right for the word transfers, they will transfer bytes. Not only will this be slow, of course, but because you can't write to VRAM in bytes, the data will be corrupted.
The solutions for this have mostly come down to “so don't do that then”. Often, this can be sufficient: tiles in VRAM are word-aligned by definition, and source graphics data can and should be word-aligned anyway. However, now that I'm finally working on a bitmap blitter for 8bpp and 16bpp, I find that it's simply not enough. So I wrote the following set of functions to serve as replacements.
My main goal here was to create smallish and portable replacements, not to
have the greatest and fastestest code around because that's rather platform
dependent. Yes, even the difference between GBA and NDS should matter,
because of the differences in
ldr/str times and caching.
There are 5 functions here. The main functions here are
__toncset for copying and
filling words, respectively. The other 3 are interfaces for
for filling 8-bit, 16-bit and 32-bit data; you need these for, say, filling with a color
instead of 8-bit data. For the rest of the discussion, I will use the name
“toncset” for the internal routine for convenience.