Our website uses cookies to enhance your browsing experience.
Accept
to the top
close form

Fill out the form in 2 simple steps below:

Your contact information:

Step 1
Congratulations! This is your promo code!

Desired license type:

Step 2
Team license
Enterprise license
** By clicking this button you agree to our Privacy Policy statement
close form
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
* By clicking this button you agree to our Privacy Policy statement

close form
Free PVS‑Studio license for Microsoft MVP specialists
* By clicking this button you agree to our Privacy Policy statement

close form
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

close form
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

close form
check circle
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
The first static analysis report: the k…

The first static analysis report: the key problems and how to address them

Dec 09 2021

The main purpose of the static analyzer is to detect and report errors in code - so that you can fix them afterwards. However, reporting errors is not as simple as it may seem. Those just starting out to work with static analysis - and even experienced developers - may encounter a number of problems. In this article I'll talk about these problems and how to deal with them.

0896_FirstReportSA/image1.png

What is static analysis?

Static analysis is a technology that allows to find errors and potential vulnerabilities in the source code without actually executing it. Here's an example of how this works. Not so long ago the world welcomed the LLVM 13.0.0 release, and we checked it for errors. And - sure enough - we found curious errors in this project. Here's one of them:

bool operator==(const BDVState &Other) const {
  return OriginalValue == OriginalValue && BaseValue == Other.BaseValue &&
    Status == Other.Status;
}

The analyzer issues the following warning to the code above:

[CWE-571] There are identical sub-expressions to the left and to the right of the '==' operator: OriginalValue == OriginalValue RewriteStatepointsForGC.cpp 758

As you can see, someone made a typo in the first comparison - and wrote OriginalValue instead of Other.OriginalValue. That's a classic error in comparison methods. And it can lead to different and unpleasant results.

Problems in the first report

So, suppose you found out that there is a thing called static analysis and decided to try it. You download this tool, start the analysis for your project and get a result that looks approximately like this:

0896_FirstReportSA/image2.png

This is the PVS-Studio plugin's window in Visual Studio 2022. The window contains a set of buttons at the top, and also displays a list of warnings. What the warnings say is irrelevant right now. I want to point your attention to something else - the number of the warnings shown:

0896_FirstReportSA/image4.png

And that's where the first problem lies - the number of warnings. In this example, there are over 2000 of them. That's a lot! When someone is just testing the waters, they do not want to go through so many warnings - it's tedious to do so. Meanwhile, that list may contain a warning about a major and very real error that can shoot you in the foot in the near future and force you to spend lots of time, energy and brain cells to fix it.

Well, suppose you decide to look through that long list after all. And here you'll face problem number two - identical warnings:

0896_FirstReportSA/image6.png

From the analyzer's perspective, everything is fine. It found many identical errors in the code and reports just that. However, right now we are talking about the first time someone uses the analyzer. What they want - is to see diversity. Looking at 4 identical warnings is uncool.

Now it's time for the third problem - and the most significant one. False positives. A false positive is a warning that reports code that does not contain an error.

Suppose the first ten warnings are false positives and the other warnings are legitimate and helpful. The person starts out looking through the warnings one by one, sees these 10 false positives, concludes this list is nonsense, and closes the report. All this while warning number 11 is an error we described at the beginning - the one that shoots you in the foot when you least expect it.

How to address the first report's problems

To solve the problems described above I would like to offer you a good old approach with weights. See, the static analyzer has diagnostic rules. Each responds to a specific error pattern. So first, you sit down, analyze all the diagnostics and set their initial weight. The best diagnostics get more weight, the worst ones - less weight. What's a best diagnostic? This is a diagnostic characterized by one of the following: it has the lowest chance of producing a false positive, detects a serious error, or is very peculiar. That's upon you.

However, just assigning a weight is boring. Let's take it a step further. After you record all the errors for a project, you can, for example, change these weights in some way. I'll highlight several criteria here:

  • the warning's category;
  • warnings for tests;
  • the number of warnings issued per line in a specified file;
  • how often a specific diagnostic is triggered.

To make this clearer, I'll use an example. Here's how we implemented this mechanism - we called it Best Warnings - at PVS-Studio. I'll start from the beginning.

The warning's category

This weight change is based upon some initial system that groups warnings. For example, we break our diagnostics up into three groups: High, Medium, and Low. High stands for the most reliable warnings, we do not alter their weight. The Medium group contains warnings of the average reliability. To these, we apply a coefficient that lowers their weight.

Warnings for tests

This group includes warnings issued for tests or autogenerated files. Sometimes the analyzer finds real errors in tests, and that's very cool. However, in most cases, warnings issued for such code are false positives because of the specifics of such code. This is why warnings for tests are better left for later. There's no need to show them to someone just starting out with an analyzer.

The number of warnings issued per line

Sometimes one line of code triggers several diagnostics. Experience shows that such lines usually do contain an error. Consequently, it's logical to add weight for such warnings.

How often a specific diagnostic is triggered

If a diagnostic's triggerings take up more than N% of all warnings, this usually means the diagnostic's warnings of a fairly low value. Such response is usually related to a coding style or to carelessly written macros in C or C++ code.

All categories described above solve problem number three. To address the second problem and get a variety of warnings, you need to take a diagnostic's all warnings and find the heaviest one among them. Leave its weight unchanged and decrease the weight of the remaining ones. Do the same for each diagnostic. As a result, you get a variety of warnings.

Now only the first problem remains - there are too many warnings. To address this, sort all problems by weight and show the N number of the heaviest ones. This way, you analyze the report instead of your clients and provide them the very essence of what the analyzer found.

How it all looks in PVS-Studio

Our Visual Studio plugins now contain a button that you can click to execute all of the above. It runs a script we created. Let's get back to the example above and run it through this mechanism:

0896_FirstReportSA/image8.png

Here's the report. Now, instead of examining it at once, we click two buttons. One:

0896_FirstReportSA/image9.png

And two:

0896_FirstReportSA/image11.png

After this, the report looks as follows:

0896_FirstReportSA/image13.png

As you can see, only 10 warnings remain. They are much easier to go through than 2000 :)

We have also made sure that the list contains a variety of warnings. You can see a diversity of diagnostic codes and we decided to allow no more than 2 of the same warnings.

Conclusion

Taking into account all of the above, the weight mechanism solves a number of problems at once. Looking through the first report is easier, because the volume of warnings is reduced and becomes convenient to examine. The warnings become more reliable and diverse. Clients save time and can evaluate the tool faster. For example, it took me only about 10-15 minutes to install the analyzer and view the warnings.

Feeling curious? Click here to see how it works in real life!

Thank you for reading!

Popular related articles


Comments (0)

Next comments next comments
close comment form