The integer datatypes in C can be either signed or unsigned. Sometimes, it's obvious which should be used; for negative values you clearly should use signed types, for example. In many cases there is no obvious choice – in that case it usually doesn't matter which you use. Usually, but not always. Sometimes, picking the wrong kind can introduce subtle bugs in your program that, unless you know what to look out for, can catch you offguard and have you searching for the problem for hours.
I've mentioned a few of these occasions in Tonc here and there, but I think it's worth going over them again in a little more detail. First, I'll explain how signed integers work and what the difference between signed and unsigned and where potential problems can come from. Then I'll discuss some common pitfalls so you know what to expect.
1 Basics
The signedness of a variable refers to whether it can be used to represent negative values or not. Unsigned variables can only have positive values; signed values can be both positive or negative.
In the computer world, signedness is mostly a matter of interpretation. Say you have a variable that is N bits long. This is enough room for 2^{N} distinct numbers, but it says nothing about which range of numbers you should be using them for. Interpreted as unsigned integers, its range would be [0, 2^{N}−1]. Under a signed interpretation, you'd use some bitpatterns for negative numbers. There are actually several ways of doing this, but the most commonly used is known as two's complement which leads to a [−2^{N−1}, 2^{N−1}−1] range: half positive and half negative.
1.1 Two's complement theory
Two's complement is sometimes seen as an awkward system, but it actually follows quite naturally when you only have a fixed number of digits to write down numbers with. Consider the whole line of positive and negative integers. As you move away from zero, the numbers will grow larger and larger. Now suppose you have an counting device composed of a limited number of digits, each of which can only display numbers 0 through 10−1. With N digits, you only have room for 10^{N} different numbers, and once those are used up (at 10^{N}−1), the counter returns to 0 and counting effectively resets. In essence, the number on the counter works in modulo 10^{N}.
The key is that this works in both positive and negative directions. As far as the counter is concerned, 0 and 10^{N} are the same thing. This being the case, you can argue that −1 (that is, the number before zero) is equivalent to 10^{N}−1; and −2 ≡ 10^{N}−2, and so on. Note that this works regardless of what 10 actually is; it can be ten (decimal), two (binary) or sixteen (hexadecimal).
The 10^{N} possible numbers form a window over the number line, but where the window starts is up to the user. For signed numbers, you can move the window so that the upper half of the 10^{N} range is interpreted as negative numbers.
Fig 1 shows how this works for 8bit numbers (written in hex for convenience). The black numbers represent the entire number line, where numbers can have as many digits as you need. With only two nybbles, the counter repeats every 100h = 256 values. FFh, 1FFh, but also −1 all reduce to the same symbol, namely FFh. In Fig 2 you can see how the available symbols are mapped to either signed or unsigned values. In the unsigned case, numbers simply count from 0 to FFh; for signed, the top half of the symbol range is put on the left side of zero and are used for negative numbers.
The mathematical reason behind all this like this. Assume for convenience that N = 1, so that 0 is equivalent to 10 and in fact every multiple of 10. By definition, subtracting a value from itself gives 0. Because subtraction is merely addition by its negative value, you get the following:
(1) 
The term −x in the last step should be seen as a unit, call it C. Numerically, C is the number that, when added to x, gives 10. In decimal, if x = 1, then C = 9. C is called the 10's complement of x, because it's what's needed to complete the 10. It's called the two's complement in binary, because then 10 equals two.
In binary, there is an alternative to calculate the twos complement of a number. Subtracting a number from 2^{N} is equivalent to inverting all its bits, so you get:
(2) 
Using two's complement^{(1)} for negative numbers has some interesting properties. First, subtraction and addition are basically the same thing. This is nice for arithmetic implementers for two reasons: the same hardware can be used for both operations, and it can be used for both positive and negative numbers.
Second, because the top half
is now used for negative numbers, the most significant bit can be
seen as a sign bit. Note: a sign bit, not the sign bit.
There is a subtle linguistic difference here. When talking about
the
sign bit, one may thing of it as a single bit that indicates the sign.
For example, 8bit +1 and −1 could be `0000 0001
'
and `1000 0001
', respectively. In two's complement,
however, +1 and −1 are actually `0000 0001
'
and `11111111
' (the sum of which is
`1,00000000
' ≡ 0
, as
it should be).
1.2 Declaring signed or unsigned
In the end, whether a particular group of bits is signed or unsigned is
a matter of interpretation. For example, the 8bit group
`1111 1111
' can be either 255 or −1, depending on
how you want to look at it. You can't determine the signedness
from just the bits themselves.
Also, when you've decided you're going to use a signed interpretation,
whether the group forms negative number or not depends on the size of
the group. for example, consider the two bytes `01 FF
'.
As separate bytes, these would form +1 and −1, respectively.
However, if you view them as a single 16bit integer
(‘short”), it forms 0x01FF, which is a positive number.
In C, you specify signedness when you declare a variable. The general
rule is that an integer is signed unless the keyword
`unsigned
' is used. The exception to the rule is
`char
', whose default signedness is platform and
compilerdependent! Be careful with this particular datatype.
unsigned int ib; // Unsigned integer.
short sa; // Signed 16bit integer.
unsigned short sb; // Signed 16bit integer.
char ca; // ??signed 8bit integer.
signed char cb; // signed 8bit integer.
unsigned char cc; // unsigned 8bit integer.
Because they're shorter and more descriptive, the following typedefs
are often used for variable declarations. Basically, it's
‘s
’ or ‘u
’ for signed
or unsigned, respectively, followed by the size of the type in bits.
Unsigned variants are also sometimes indicated by
‘u”+typename.
Base type  Signed  Unsigned  

char  s8  u8  uchar 
short  s16  u16  ushort 
int/long  s32  u32  uint 
long long  s64  u64 
In assembly, you can't declare the signedness of variables, because
there's no such thing as variables. There's only labels and how you
use those labels determines what the related data are. Technically,
there is only one datatype: the 32bit word, corresponding to C's
int or long. The other datatypes are essentially emulated, or
defined by how which memory instructions you use:
LDRB/LDRSB/STRB
for bytes and
LDRH/LDRSH/STRH
for halfwords. For most data
operations, signedness is irrelevant and as such mostly ignored.
Only in a few cases does the sign actually matter and as these are
essentially the topic of the rest of the article, we'll get
to those eventually.
2 Potential problems
The following sections are cases where signedness may become problematic. I say “may”, because often it just works out. But that's just the thing: it can work most of the time and then things can go horribly wrong all of a sudden. The root of the problem comes down to one thing: negative numbers; usually, negative numbers becoming large positive numbers when interpreted as unsigned values.
For example, 32bit signed −1 = 0xFFFFFFFF = unsigned 4294967295 (= 2^{32}−1). If nothing else, remember that part.
2.1 Sign extension, casting and shifting
When you go from a small datatype to a larger one, you're essentially adding a new set of bits at the top, and these bits have to be initialized in a meaningful way. The addition of these bits should have no effect on the value itself. For example, +1 should remain +1 and −1 should remain −1. What this boils down to for two's complement is that the new bits need to be filled with the signbit of the old value. This is called sign extension, because the topbit (the signbit) is extended into all the higher bits. There is also zeroextension, which is when the higher bits are zeroed out. These two forms effectively correspond to signed and unsigned casting. ^{(2)}.
Conversions of this kind actually happen all the time,
without any kind of direct intervention from the programmer. Data
operations are always done in CPU words and any time you use a smaller
datatype, there is the need to sign or zeroextend.
This also brings forth the question of which type of extension will be
used: sign or zeroextension. As the following bit of code shows, it
depends on the signedness of the variable you're converting from.
8bit variables sc
and uc
are both initialized
by 0xFF, which is either −1 or 255 (you can use either of those too,
by the way). After that, these are used to initialize signed or unsigned
words.
As you can see from the output, the value in the words correspond
to the signedness of the bytes, not the words. Also note that printing
sc
(the signed byte) gives 0xFFFFFFFF and not the 0xFF you
initialized it with, and which are in fact its actual contents since
0xFFFFFFFF is too large to fit into a byte. However, when using it with
anything, it's automatically extended to wordsize. This becomes great
fun when you later compare it to 0xFF again.
void test_conversion()
{
s8 sc= 0xFF; // 8bit 1 (and 255)
u8 uc= 0xFF; // 8bit 255 (and 1)
s32 sisc= sc, siuc= uc;
u32 uisc= sc, uiuc= uc;
printf(" sc: %4d=%08X ; uc:%4d=%08X\n", sc, sc, uc, uc);
printf("sisc: %4d=%08X ; siuc:%4d=%08X\n", sisc, sisc, siuc, siuc);
printf("uisc: %4d=%08X ; uiuc:%4d=%08X\n", uisc, uisc, uiuc, uiuc);
printf("sc==0xFF : %s\n", (sc==0xFF ? "true" : "false") );
/* Output:
sc: 1=FFFFFFFF ; uc: 255=000000FF
sisc: 1=FFFFFFFF ; siuc: 255=000000FF
uisc: 1=FFFFFFFF ; uiuc: 255=000000FF
sc==0xFF : false
Warnings issued (for sc=0xFF):
 warning C4305: 'initializing' : truncation from 'const int' to 'signed char'
 warning C4309: 'initializing' : truncation of constant value
*/
}
Sign and zeroextension also play a role in rightshifts. When using shifts for arithmetic (shiftright is shorthand for a division by power of two), you want the sign preserved. For example, when dividing −16 = 0xFFFF:FFF0 by 16 (shiftright by 4), you want the result to be −1 (=0xFFFF:FFFF), and not 268435455 (=0x0FFF:FFFF). The rightshift that preserves the sign is the arithmetic rightshift, and is used for signed numbers. For unsigned numbers, or if the variable is considered a set of bits instead of a single number, a logical rightshift is appropriate, since that uses zeroextension.
In assembly, arithmetic and logical rightshift are called
ASR
and LSR
, respectively. In Java and other
languages where the keyword unsigned
does not exist
the difference is indicated by >>
(signextend)
and >>>
(zeroextend). In C, however,
both types use the same symbol: >>
. As such,
you cannot tell which type of extension is used from just the
expression; you'd have to look at the signedness of the operands
(including temporaries) to see if it's a logical or arithmetic
rightshift.

This ambivalence of shift symbols in C can be a major source of pain in fixedpoint calculations. Since unsigned has precedence over signed, if you have an unsigned variable at any point of the calculation, all subsequent calculations are unsigned too and you can kiss negative numbers goodbye. If everything starts going wrong as soon as you move in another direction or if rotations aren't calculated properly, this will be the cause.
The code below illustrates the problem in a very common situation. You have a position p, and a directional vector for movement, u. Since you want subpixel control of these, you use fixedpoint notation for both (I'm assuming nonFPU system here). The u vector is a unit vector (say, cos(α), sin(α)); to get to the full velocity vector, we have to multiply u by some speed. The procedure comes down to something like this:
p_{new} = p_{old} + speed·u 
In the example, I'm only considering the xcomponent for convenience. Now, because position and direction can have negative components, those would be signed. The speed, however, is a length and therefore always positive, so it makes sense to make it unsigned, right? Well, yes and no. As you can see from the result, mostly no.
With speed = +1 and u_{x} = −1, the
end result should be +1*−1 = −1, which would be 0xFFFFFF00
in Q8 fixedpoint notation. However, it isn't, thanks to the
unsignedness of speed
, which makes subsequent arithmetic
unsigned so the rightshift does not signextend. So instead of the
small step you intended, you get a giant leap into no man's land.
{
// Assume movement for 2 directions, with Q8 for everything.
// a = look direction.
// p = (px, py) = position.
// u = (ux, uy) = ( cos(a), sin(a) )
int px= 0; // Starting position.
int ux= 1<<8; // Moving backwards.
uint speed= 1<<8; // Unsigned as speed's always >= 0, right?
px = px + (speed*ux>>8); // Fixed point motion. Result should be 1<<8.
printf("px : %d=%08X\n", px, px);
/* Result:
px: px : 16776960=00FFFF00
In other words: NOT the 1<<8 = 0xFFFFFF00 you were after.
*/
}
This mistake is depressingly easy to make, even for those who generally think about which datatype to use. Especially those people, as they're prone to optimize prematurely and automatically pick unsigned for a variable that will never be negative. The danger is that unsigned arithmetic has precedence, which can screw up at later rightshifts.
Bottom line: variables used in fixedpoint calculations should be signed. Always.
2.2 Division
This isn't really a signedvsunsigned item per se, but integer division behaves in a peculiar way for negative numbers. It becomes one, however when you throw rightshift in the fray, which doesn't quite work like a division equivalent anymore for negative numbers. To discriminate between integer and normal division, I will use ‘\ ’ for integer division in this section. Note also the modulo operation is intimately tied to division, so this section applies to that as well.
What integer division comes down to is taking a normal division and throwing
away the remaining fraction. For example, 7 / 4 = 1¾. The
integer division is just 1. This is also true for negative numbers:
−7 / 4 = −1¾, so 7 \ 4 = −1. In short,
integer division rounds towards zero. With bitshifting, however, you get
something slightly different. Theoretically, x>>n is
equivalent to x \ 2^{n}. For positive numbers, this is
true: 7>>2 in binary is
00000111
>>2 = 00000001
.
But with −7>>2 you get
11111001
>>2 = 11111110
= −2.
Divisionbyrightshift always rounds to negative infinity.
The upshot of this difference is that for negative numbers, the results of x \ 2^{n} and x>>n will be out of sync, as Table 3 illustrates. They still give identical results for positive numbers though.
x (dec)  x \ 4  x>>2 (dec)  x (bin)  x>>2 (bin)  

9  2  3  11110111  11111101  
8  2  2  11111000  11111110  
7  1  2  11111001  11111110  
6  1  2  11111010  11111110  
5  1  2  11111011  11111110  
4  1  1  11111100  11111111  
3  0  1  11111101  11111111  
2  0  1  11111110  11111111  
1  0  1  11111111  11111111  
0  0  0  00000000  00000000  
1  0  0  00000001  00000000  
2  0  0  00000010  00000000  
3  0  0  00000011  00000000  
4  1  1  00000100  00000001  
5  1  1  00000101  00000001  
6  1  1  00000110  00000001  
7  1  1  00000111  00000001  
8  2  2  00001000  00000010  
9  2  2  00001001  00000010 
There are some other consequences besides the obvious difference in results. First, there's how compilers deal with it. Compilers are very well aware that a bitshift is faster than division and one of the optimizations they perform is replacing divisions by shifts where appropriate^{(3)}. For unsigned numerals the division will be replaced by a single shift. However, for signed variables some extra instructions have to added to correct the difference in rounding.
Second, note that the standard integer division does not give an equal distribution of results: there are more results in the zerobin. Shiftdivision spreads the results around evenly. In some cases, you will want to use the shift version for that reason. One clear example of this would be tiling: using the ‘proper’ integer division would give you oddlooking results.
Table 3 shows that for negative numbers, integer division and rightshift don't give the same results. If you do want the same results, the following equations can be used. Given x < 0 and N = 2^{n}, then
GCC will use the x\N equivalence to produce signed integer division if possible.
2.3 Comparisons
The last area where signedness can be a factor is comparisons. The
next bit of code is from my implementation of a filled circle renderer
with boundary clipping. The circle is centered on
(x_{0}, y_{0}). Variables
x
and y
are local variables that
keep track of where we are on the circle, because these
can be negative, they must be signed. Variables dstW
and dstH
are the destination image's width and
height. Since width and height are unsigned by definition,
it'd make sense to make these unsigned, right? Right?
int dstP= srf>pitch/2; // used in arithmetic, so signed.
uint dstW= srf>width, dstH= srf>height; // Unsigned by definition.
u16 *dstD= ((u16*)srf>data)+(y0*dstP);
int x=0, y= rad, d= 1rad, left, right;
...
// Side octants
left= x0y;
right= x0+y;
\<b\>if(right>=0 && left<=dstW)\</b\> // Fully out of bounds
{
if(left<0) left= 0; // Clip left
if(right>=dstW) right= dstW1; // Clip right
// Render at scanlines y0x and y0+x
if(inRange(y0x, 0, dstH))
armset16(color, &dstD[x*dstP+left], 2*(rightleft+1));
if(inRange(y0+x, 0, dstH))
armset16(color, &dstD[+x*dstP+left], 2*(rightleft+1));
}
...
Well, apparently not. When I tested this, right and bottom edge clipping went fine, but when the circle went over the top or left edge, it disappeared completely.
The problem lies with the line in bold, which does the trivial rejection
test. Variables left
and right
are the left and
rightmost edges of the scanline of the circle. If this is completely
to the left of the screen (right
< 0) or to
the right of the screen (left
≥ dstW
)
then there's nothing to do.
Technically, the tests on that line are correct, so the code
should work.
The reason it doesn't actually occurs a few lines earlier: the
definition of dstW
as an unsigned variable. Because of
this, the second condition is an unsigned comparison. Now think of
what happens when left
moves over the left of the
screen. left
becomes becomes a (small) negative number,
which is converted to postive number for the comparison.
A large positive number for that matter – one that's
quite a bit larger than the width of the image and as a result
the routine thinks the circle is out of bounds.
So again, a routine went all wonky because I assumed that, since a width is always positive, using an unsigned variable would be a good idea.
The worst part of this particular bit, however, is that I should have known this. The compiler actually issues a warning for this type of thing:
warning: comparison between signed and unsigned integer expressions
Or at least it would have if I hadn't disabled the warning because the message was cropping up everywhere in my normal and signsafe forloops. Let this be a lesson: disable warnings at your own risk and for Offler's sake do not ignore them.
2.4 Well, duh
The problems covered above are the subtle ones, where you have to be aware of some of the details that go into the C language itself. There are also a few issues where the programmer really should have known they were going to be a problem from the start.
The first example is, again, one that can occur when optimizing prematurely. You may have heard that loops work better when you count down instead of count up, because in machine code a subtraction is an automatic comparison to zero. So, a clever programmer may turn this:
for(i=0; i<size; i++)
{
// Do whatever
}
into this:
for(i=size1; i>=0; i)
{
// Do whatever
}
There are two problems with this code. First, the change probably will not matter with modern compilers because they are aware of the equivalence and can do this conversion themselves^{(4)}, so there's nothing to gain from this.
The real problem, however, is the terminating condition:
`i>=0
'. Since i
is unsigned, it can
never be negative, and therefore the condition is always true.
The second example involves bitfields. As it happens, bitfields can be signed or unsigned as well. For the most part, handling this is like handling normal signedness, but there is one situation where you have to be careful.
{
struct Foo {
int s7 : 7; // 7bit signed
uint u7 : 7; // 7bit unsigned
int s1 : 1; // 1bit signed
uint u1 : 1; // 1bit unsigned
};
Foo f= { 1, 1, 1, 1 };
printf("s7: %3d\nu7: %3d\ns1: %3d\nu1: %3d\n\n", f.s7, f.u7, f.s1, f.u1);
/* Results:
s7: 1 // Inited to 1
u7: 127 // Inited to 1
\<b\>s1: 1 // Inited to 1\</b\>
u1: 1 // Inited to 1
*/
}
In the code above I've created a biffielded struct with both signed and unsigned members. There are two 7bit fields and two 1bit fields, and these are initialized to −1 and +1, respectively. The values are then printed.
The 7bit fields work as you might expect. f.s7
is
−1, as it's signed, and f.u7
is 127, which is the
7bit equivalent of −1. The interesting case is for
f.s1
. This is initialized to 1, but comes out as
−1, because for a single signed bit the possibilities
are 0 and −1, and not 0 and +1! Without this knowledge,
a later test like `f.s1==1
' might give unexpected results.
3 Summary
So, summarizing:
 Unsigned variables only represents positive numbers; signed ones can have positive or negative values. Negative numbers are usually represented via two's complement, which is based on the cyclical nature of counters when you have a limited number of digits.

In C, integers are signed unless specified otherwise, except
for
char
, whose signedness is compiler dependent.  Careless use of signed and unsigned types can result in subtle runtime bugs with notsosubtle results. Usually, what happens is that a negative number is reinterpreted as a very large positive number and everything goes bananashaped.
 Unsigned has a higher operator precedence than signed. If one of the operands is unsigned, the operation will use unsigned arithmetic. This can cause problems for divisions, modulos, rightshifts and comparisons.
 For negative numbers, division/modulo by 2^{n} is not quite the same as rightshifts/ANDs. Analyse which is best for your situation, then act accordingly.
 Ignore compiler warnings at your own peril.
 The place where a bug manifests is not always the place where it originates. The declaration of variables matters! Do not forget this when debugging or when asking for assistance.
There isn't really a hard rule on when to use which signedness, but here are a few guidelines nonetheless.
 If a variable can, in principle, have negative values, make it signed. If it represents a physical quantity (position, velocity, mass, etc), make it signed.
 A variable that represents logical values (bools, pixels, colors, raw data) should probably be unsigned.
 And now the big one: just because a variable will always be positive doesn't mean it should be unsigned. Yes, you may waste half the range, but using signed variables is usually safer. If you must have the larger range (for the smaller datatypes, for example), consider defining the storage variables unsigned, but convert them to local signed ints when you're really going to use them.
 If mathematical symbols were gods, the minus sign would be Loki. Be extra careful when you encounter them. If there are minus signs anywhere in the algorithm, or even the potential for negative numbers, everything should be done with signed numbers.
Notes:
 Or any 10's complement, really.
 One could say that zeroextension is just a form of signextension; it's just that the sign for an unsigned number is always positive.
 And please let the compiler do its job in this regard: the low operatorprecedence of shifts makes their use awkward and errorprone. If you mean division, then use division.
 Although they may well do it incorrectly: turning the decrementing loop into an incrementing one. Point is, the compiler may not follow exactly what you're doing anyway.
Small typo I noticed, "The general rule is that unless an integer is signed unless the keyword `unsigned' is used." I imagine should be "The general rule is that an integer is signed unless the keyword `unsigned' is used."
Your articles are, as always enlightening and informative. All the best!
whoah this blog is great i love reading your articles.
Keep up the great work! You already know, a lot of
individuals are hunting around for this info, you can aid them greatly.
Feel free to visit my page: online data entry jobs,
Alfie,