Andrey Karpov

Mar 16 2023

Tags:

#StaticAnalysis

What static analysis cannot find

Mar 16 2023

Author: Andrey Karpov

Static code analysis
Lack of high-level information about code
Computational complexity
Conclusion

Static code analysis is valuable because it helps detect errors in the early stages of development. However, it is not omnipotent and there are a number of limitations that prevent it from detecting any variety of errors. Let's dig deeper into the topic.

1037_What_static_analysis_cannot_find/image1.png

Static code analysis

The static code analyzer takes the source code of a program as input and outputs warnings to indicate anomalies in code. The developer reviews the warnings issued. After that, they decide whether code needs to be fixed or it is actually fine and no changes need to be made.

Static analysis tools use rather complex technologies to detect errors: data flow analysis, symbolic execution, interprocedural analysis, and so on. You can find a more detailed description in the article "PVS-Studio: static code analysis technology".

However, static code analyzers have two weak points:

False positives of the static code analyzer. There is no error, but the analyzer issues a warning. The programmer wastes time investigating the error.
Static code analyzer issues no warnings (false-negative results). Some real bugs and vulnerabilities go undetected by the analyzer and will have to be detected in other ways.

The listed weak points of the static analysis methodology are in fact unavoidable. Let's now look at two reasons why they occur.

Lack of high-level information about code

Looking at code, it is often impossible to say whether there is an error or not. Not only the analyzer, but also a person is not able to do this. To understand whether code contains an error or not, you need to know what the author of the code intended to write. In other words, you need to know what program behavior is expected.

Here is the simplest abstract example:

double Calc(double a, double b)
{
  return a * sin(b);
}

Is there an error? Who knows. The author may have made a mistake and in fact the formula should use a function to calculate cosine, not sine.

In other cases, on the contrary, the analyzer issues a warning for nothing. For example, it can issue a warning for such a code:

int Foo(int x, int y)
{
  if (x == 0)
    x++; y++;
  return x * y;
}

The code looks very strange. Chances are high that curly brackets are forgotten here and the analyzer rightly draws attention to this code. You don't have to write the code that way, but you cannot say for sure that it doesn't work as it should. Only the author of the code can say whether this code should look like this:

int Foo(int x, int y)
{
  if (x == 0)
  {
    x++;
    y++;
  }
  return x * y;
}

Either that way:

int Foo(int x, int y)
{
  if (x == 0)
    x++;
  y++;
  return x * y;
}

As we found out, one of the limitations of static code analysis stems from a lack of information about how the program should actually work. Ways to compensate for this shortcoming:

Classic code review. The author of code and their teammates know how the program should work and can detect errors using their knowledge. In other words, they can check from a high-level perspective that the code does exactly what it is intended to do.
Unit tests. It is possible to detect an error in the Calc function discussed above by revealing that its result on the test data differs from the control values.
Manual application testing. The tester may notice that the program does not behave as it should.
Formal verification. A special language formally describes how a program should behave. After that, it is checked that the logic implemented in the code conforms to the formal description. This approach has not been widely used, due to the large labor costs. Sometimes it turns out that the formal description of a function is larger and more complex than the function body itself.

Computational complexity

The capabilities of the static analyzer are hindered by the "halting problem". However, even without considering extreme cases, static analyzers face a limitation in computing power.

Let's look at the body of a simple function:

void foo(int *p)
{
  *p = 1;
}

Considering only this function, it is impossible to say whether or not a null pointer dereference can occur here.

This is where the search for a compromise begins: to issue a warning that is likely to be false, or to remain silent and not report a possible problem.

Following the first path, the analyzer can issue warnings at the foo function body and encourage programmers to always write checks just in case:

void foo(int *p)
{
  if (p != NULL)
  {
    *p = 1;
  }
}

However, this is a dead end as the analyzer begins to generate a large number of barely useful warnings. Instead of finding real bugs, the analyzer forces the programmer to "fight the warnings".

It is more useful to track whether a null pointer can actually be passed to the foo function somewhere. For example, the function may be used only in one place as follows:

void X(int *p)
{
  if (p == NULL)
    return;
  foo(p);
}

Then there is definitely no error. There is no need to issue a warning. But this is a very simple case. In practice, the function can be called from different places, and the pointer can be passed through many calls.

It is even more difficult to track possible values of global variables or array elements. In this case, the static analyzer, in some sense, begins to emulate the execution of the program code to understand the possible values of variables. And the deeper and more accurately the analyzer does it, the more memory and time it needs. The static analyzer does not know what the input data may be, so ideally it should sort through all possible options. If we have a large software project, then it can only do this by consuming some incredible amount of memory and working for millennia.

Of course, no one wants an analyzer that will check the code longer than people will be using the resulting program :). Therefore, a trade-off has to be made between the accuracy of the analysis and the running time. In practice, there is an implicit requirement for analyzers to check a project in a few hours. For example, during a nightly build.

Incidentally, input data analysis raises a separate topic related to detecting potential vulnerabilities. Analyzers use a special technology for this, called Taint-analysis (taint checking).

It is possible to compensate for the described limitations in computing power by using other methods, for example, dynamic code analysis. There is no "static analysis vs dynamic analysis" or "static analysis vs unit tests" comparison. Each methodology has its own strengths and weaknesses. The high quality and reliability of code is achieved by using them together.

Conclusion

Static analyzers are not omnipotent, but they are good helpers. They can detect many errors, typos and unsafe constructions even before the code review stage. It is much more useful for code reviewers to focus on discussing high-level flaws and sharing knowledge than seeking out whether there is a typo in a boring comparison operator. Moreover, as our practice shows, it is still very difficult for people to notice such errors: The evil within the comparison functions. Let a static code analyzer do this boring job.

#StaticAnalysis

Tags:

#StaticAnalysis

Fill out the form in 2 simple steps below:

Your contact information:

Desired license type:

What static analysis cannot find

Static code analysis

Lack of high-level information about code

Computational complexity

Conclusion

Comments (0)

Want to try PVS‑Studio for free?

Achievements

PVS-Studio

Licensing

Company