On high-performance processors, branches are bad because:
The best strategies for avoiding these penalties are a combination of:
Note that &&, || and ?:, in general, also introduce branches. Also, the relational operations ==, !=, >, <, >=, and <= can generate branches when used as arguments to arithmetic or assignment operators.
&& and || also can introduce branches. For instance,
if( i < N && x[i] > 0 )
is semantically identical to
if( i < N )
if( x[i] > 0 )
In general, a compiler will end up generating two branches for this code.
# if( i < N )
cmplt p,i,N
iftrue L1
# if( x[i] > 0 )
ld t,x,i
cmpgt q,i,0
iftrue L1
L0:
On the majority of processors, the result of a relational operation is placed in a condition flag or a condition code register. If it is then used in an expression this value must be moved to a general-purpose register. On most processors, this is best achieved using branches. For instance consider the following C code.
y = x + (i!=0);
On many processors, the (naive) code generated by the compiler will be equivalent to:
t = 0;
if( i != 0 ) {
t = 1;
}
y = x + t;
# t = i != 0
set t,0
cmpne i,0
iffalse L1
L0:
set t,1
L1:
# y = x + t
add y,x,t
Of course, with optimization, the code produced will probably actually be equivalent to:
y = x;
if( i != 0 ) {
y = x + 1;
}
In either case, there will be branches introduced by the compiler.