V567. Modification of variable is unsequenced relative to another operation on the same variable. This may lead to undefined behavior.
The analyzer has detected an expression leading to undefined behavior. A variable is used several times between two sequence points while its value is changing. The result of such an expression is impossible to predict. Now, let's take a closer look at the concepts of "undefined behavior" and "sequence point".
Undefined behavior is a characteristic of certain programming languages—most notably C and C++—that allows for implementation flexibility. In other words, the specification deliberately does not define the result of certain operations, leaving their behavior up to the compiler implementation.
In C, for example, using any automatic variable, if used before initialization, yields undefined behavior, as does dividing by zero or indexing an array outside of its defined bounds. This frees the compiler to do whatever is easiest or most efficient, should such a program be submitted. In general, any behavior afterwards is also undefined. In particular, the compiler is not required to detect undefined behavior. So, programs that invoke it may appear to compile and even run without errors at first, only to fail on another system, or even at a later date. As far as the language specification is concerned, anything—or nothing at all—can happen when undefined behavior occurs.
A sequence point in imperative programming defines any point in a program execution where it is guaranteed that all side effects of previous evaluations have been completed, and no side effects from subsequent evaluations have yet appeared.
They are often mentioned in the context of C and C++ because these languages make it particularly easy to write expressions whose values may depend on an undefined order of side effects. Adding one or more sequence points imposes a stricter order and is a way to achieve stable (i.e., correct) behavior.
Note that in C++11, the terms "sequenced before/after", "sequenced", and "unsequenced" were introduced instead of sequence points. As a result, many expressions that led to undefined behavior in C++03 became well-defined (for example, i = ++i
). These rules were further refined in C++14 and C++17. The analyzer issues a false positive regardless of the used standard. The fact that expressions like i = ++i
are now well-defined is not a justification for using them. It is better to rewrite such expressions to make them more readable. Moreover, if support for an earlier standard is needed later on, this can lead to an issue that is difficult to debug.
i = ++i + 2; // undefined behavior until C++11
i = i++ + 2; // undefined behavior until C++17
f(i = -2, i = -2); // undefined behavior until C++17
f(++i, ++i); // undefined behavior until C++17,
// unspecified after C++17
i = ++i + i++; // undefined behavior
cout << i << i++; // undefined behavior until C++17
a[i] = i++; // undefined behavior until C++17
n = ++i + i; // undefined behavior
Sequence points come into play when the same variable is modified more than once within a single expression. An often-cited example is the i=i++
expression, which both assigns i
to itself and increments it. What is the value of i
? A language standard must specify one possible behavior as the only acceptable one, a range of acceptable behaviors, or that the program behavior in a given case is completely undefined. In C and C++, evaluating such an expression yields undefined behavior because this expression has no sequence points.
C and C++ define the following sequence points:
- Between evaluation of the left and right operands of the
&&
(logical AND),||
(logical OR), and comma operators. For example, in the*p++ != 0 && *q++ != 0
expression, all side effects of the*p++ != 0
subexpression are completed before any attempt to accessq
. - Between the evaluation of the first, second, or third operand in a condition statement. For example, in the
a = (*p++) ? (*p++) : 0
expression, there is a sequence point after the first*p++
, meaning it has already been incremented by the time the second instance is executed. - At the end of a full expression. This category includes expression statements (such as the
a=b;
assignment), return statements, the controlling expressions ofif
,switch
,while
, ordo-while
statements, and all three expressions in afor
statement. - Before entering the called function. The order in which the arguments are evaluated is not specified, but this sequence point means that all of their side effects are complete before the function is entered. In the
f(i++) + g(j++) + h(k++)
expression,f
is called with a parameter of the originali
value, buti
is incremented before entering thef
body. Similarly,j
andk
are updated before enteringg
andh
respectively. However, the order in whichf()
,g()
, andh()
are executed is not specified, nor is the order in whichi
,j
, andk
are incremented. So, thej
andk
values in thef
body are undefined. Note that calling thef(a, b, c)
function with multiple arguments is not an instance of using the comma operator, nor does it determine the evaluation order of the argument values. - At a function return, after the return value is copied into the calling context (Unlike C, it is explicitly described only in the C++ standard).
- In a declaration with initialization, when the initializing value is completed, e.g., when the
(1+i++) in int a = (1+i++);
is completed. - In C++, overloaded operators act as functions, so a call of an overloaded operator is a sequence point.
Now let's take a look at several samples causing undefined behavior:
int i, j;
...
X[i]=++i;
X[i++] = i;
j = i + X[++i];
i = 6 + i++ + 2000;
j = i++ + ++i;
i = ++i + ++i;
It is impossible to predict the evaluation results in all these cases. Of course, these samples are synthetic, so the danger is obvious. Now, take a look at a code example taken from a real application:
while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31]))
{}
return (m_nCurrentBitIndex - BitInitial - 1);
The compiler can evaluate either of the left or right arguments of the &
operator first. It means that the m_nCurrentBitIndex
variable might be already incremented by one when evaluating m_pBitArray[m_nCurrentBitIndex >> 5]
, or it might not be incremented yet.
This code may work well for a long time. However, it will behave correctly only when built using a specific compiler version with a fixed set of compilation options. This is the fixed code:
while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
Powers_of_Two_Reversed[m_nCurrentBitIndex & 31]))
{ ++m_nCurrentBitIndex; }
return (m_nCurrentBitIndex - BitInitial);
This code does not contain ambiguities anymore. The -1
magic constant was also eliminated.
Programmers often think that undefined behavior may occur only if they use postincrement, while preincrement is safe. This is not the case. Here is an example from a discussion on this subject.
The issue:
Some user downloaded the PVS-Studio trial version, ran it on their project, and got the following warning: V567 Undefined behavior. The i_acc
variable is modified while being used twice between sequence points.
The code:
i_acc = (++i_acc) % N_acc;
The user claimed that the code did not exhibit undefined behavior because the i_acc
variable was only used once in the expression.
Although the probability of an error is rather small in this case, there is still undefined behavior here. The =
operator is not a sequence point. This means that the compiler may first put the value of the i_acc
variable into a register, and then increment the value in the register. Then, it may evaluate the expression and write the result into the i_acc
variable, and again write the register with the incremented value into the same variable. As a result, we get code like this:
REG = i_acc;
REG++;
i_acc = (REG) % N_acc;
i_acc = REG;
The compiler has the absolute right to do so. Of course, in reality, it will most likely increment the variable value right away, and everything will be evaluated as a programmer expects. However, it is better not to rely on that.
Here is one more case involving function calls.
The order of evaluating function arguments is undefined. If a variable changing over time serves as arguments, the result will be unpredictable. This is unspecified behavior. Look at this sample:
int A = 0;
Foo(A = 2, A);
The Foo
function may be called both with the (2, 0)
arguments and with the (2, 2)
arguments. The order of function argument evaluation depends on the compiler and optimization settings.
References
- Wikipedia. Undefined behavior.
- Wikipedia. Sequence point.
- Klaus Kreft & Angelika Langer. Sequence Points and Expression Evaluation in C++.
- Discussion at Bytes.com. Sequence points.
- Discussion at StackOverflow.com. Why is a = (a+b) - (b=a) a bad choice for swapping two integers?
- cppreference.com. Order of evaluation
This diagnostic is classified as:
|
You can look at examples of errors detected by the V567 diagnostic. |