The Minefields of MISRA Coverage

Posted On

by

Modern static application security testing (SAST) tools are typically used for two main purposes: finding bugs, and finding violations of coding standards. The primary purpose of CodeSonar is the former—it was originally designed to find serious safety and security defects such as memory errors, API misuse, and concurrency issues; however, it is also perfectly capable of being used for the latter, including the most popular coding standard, MISRA C 2012.

When developers are required to adhere to coding standards, they look for a tool that can help them find violations. One of the metrics they use to compare tools is the coverage: the proportion of rules that the tool claims to check, with a naïve strategy being to choose the tool that claims the highest coverage of the standard.

Unfortunately, the notion of coverage is not well-defined, and because there is no reliable source of information that can be used to compare coverage between tools, customers must trust vendors to reasonably interpret the term, and to report their coverage fairly. Unfortunately, some vendors unscrupulously exaggerate coverage in order to gain a competitive advantage, and in doing so also confuse consumers.

In this article, I shed some light on why coverage is a slippery notion, which I hope will in turn help customers make informed decisions about which tool to select. 

Some Coverage is Easy

Some rules are so simple that it is straightforward to write a checker that can find all violations with no false positives. For such rules, coverage is easy—the tool can either find violations or not, with no ground in-between. In MISRA C 2012, such rules are labeled Decidable. If the violation can be detected by looking only at one compilation unit, the rule is also labeled Single Translation Unit. For example, rule 4.2 forbids the use of Trigraphs, and is labeled this way. If a tool claims to have coverage of this category of rules, then it is perfectly reasonable to believe that claim.

Things get a little murkier, however, if a violation can only be reliably found if the tool needs to look at multiple compilation units at the same time (these are labeled System in the standard). For example, MISRA C 2012 rule 5.1: “External identifiers should be distinct” is certainly decidable, but the only way a tool can reliably find a violation is if it examines all compilation units and compares all such identifiers found in each.

If a tool claims to have full coverage of a rule with System scope, then it is only reasonable to believe that the tool is also capable of finding all the compilation units that contribute to the program. Over approximation and under approximation of the set can lead to both false positives and false negatives. Humans routinely get this wrong, so a user of a tool that does not offer an automatic way to determine the set, is running the risk of getting incorrect results.

Automatic techniques are surprisingly tricky to get right. The most effective approach is one that integrates tightly with the build system, as that is most often the most trustworthy source.

Undecidability

In MISRA C 2012, some rules are labeled “Undecidable,” meaning that it is fundamentally impossible to have a method that can, in general, say for sure if a violation is present or not. Because of this property, the author of a checker must find a sweet spot that balances the risk of false positives with the risk of false negatives. Most of these rules require an analysis that is capable of reasoning about the execution of the program, so only the most sophisticated SAST tools can be expected to do a good job. A good example is MISRA C 2012 rule 17.2, which outlaws recursion, both direct and indirect (i.e., calls through function pointers).

The trouble is, claims of coverage are often made that ignore whether a tool is good or bad at finding violations. If a tool can only find the most obvious and superficial instances of violations, is it reasonable for it to claim that it has coverage of that rule? The other side of that coin is interesting to consider too — if a tool finds all violations, but also reports so many false positives that is impractical to inspect them all, is it fair to say that it has coverage?

Coverage Breadth

The final aspect of rule coverage that makes it complicated is that coding standards are usually quite loosely defined, whereas SAST tools must have a precise definition of the properties that they are looking for. Consequently, it is common for a checker to detect a property that is either a superset or a subset of what the rule requires.

For example, let’s consider CodeSonar’s coverage of MISRA C 2012 Rule 2.2: “There shall be no dead code.” For the purposes of this rule, dead code is code that is executed, but whose removal cannot affect program behavior. CodeSonar has an Unused Value checker, which finds places where a variable is assigned a value that is never subsequently used. All such places violate the MISRA rule, but there are other ways in which the rule can be violated that are not detected by this checker. Thus, the Unused Value checker covers only a subset of what the rule specifies, and other CodeSonar checkers fill in the gaps.

In some cases, the rule and the checker are not in a strict subset/superset relationship. They may overlap a lot or a little, or the checker may detect a property that is not a direct violation but is very likely to lead to a violation of the rule.

In CodeSonar, our policy is to claim coverage only if there is a large overlap between what the rule specifies and what our checker will find, and where the checker does not yield warnings that would be reasonably judged to be false positives for that rule (notwithstanding that they may be true positives otherwise).

One Rule to Ring them All

There is one MISRA C 2012 rule in particular for which this issue is acute. Alarm bells should go off if you see a SAST tool (particularly one of the superficial tools) claim coverage of MISRA C 2012 Rule 1.3: “There shall be no occurrence of undefined or critical unspecified behavior.” This rule is so broad that it needs an additional 10-page Appendix that lists some of the specific things to avoid. This in turn references the C standards: those for C90/99 enumerate 230 instances of undefined behavior (65 of these are not covered by any other MISRA rule), and 51 instances of critical unspecified behavior (of which 17 are not covered by any other MISRA rule). Furthermore, the rationale for Directive 4.1 adds: “the presence of a run-time error indicates a violation of Rule 1.3.”

Consequently, Rule 1.3 specifies an enormous amount of forbidden behavior, including null pointer dereferences, buffer overruns, use of uninitialized memory, data races, use after free errors, and many of the other hazards of programming in C.

The problems with rule coverage claims should be clear — although this one rule (of the 143 in the standard) constitutes only 0.7% of the standard, it covers maybe 50% of the truly unpleasant kinds of failures that can befall a C program.

Furthermore, if a tool is to claim coverage of Rule 1.3, it should have checkers that have a good-sized intersection with all of those undesirable behaviors. If a tool can only find a tiny percentage of them, then it is unreasonable for it to claim coverage.

Many SAST tools that claim to find violations of MISRA rules are fairly superficial tools (such as those in the Lint family), and as such, they are very weak at finding violations of 1.3, even though they claim coverage. In contrast, advanced SAST tools such as CodeSonar are explicitly designed to find the kinds of run-time errors that constitute violations of Rule 1.3. Analyses that make it possible for them to find such defects with reasonable precision must be whole-program, path-sensitive, aware of hazardous information-flows, and capable of reasoning about concurrently-executing threads.

Evaluating MISRA Coverage Claims

In conclusion, let me summarize the most important points:

  1. There is no widely-accepted good definition of MISRA coverage, even for rules that are decidable.
  2. It is fundamentally impossible to have a perfect checker for a rule that is undecidable. False positives and false negatives for these are unavoidable.
  3. Checkers and rules do not always intersect perfectly.
  4. Claims of high coverage of MISRA rules should not be taken at face value. Vendors have penalty-free incentive to exaggerate.
  5. MISRA C 2012 Rule 1.3 encompasses a vast amount of behavior to be avoided. Only sophisticated SAST tools whose primary purpose is to find such bugs are good at finding violations of this rule.

When making a decision on which tool to use, the questions one must ask are: “What are the real problems in my code? Are they just coding standard violations, or real bugs? How well does this tool really find the problems I need to find?” The best way to answer this question is to try the tool on your own code, and evaluate the results rationally.

Modern static application security testing (SAST) tools are typically used for two main purposes: finding bugs, and finding violations of coding standards. The primary purpose of CodeSonar is the former—it was originally designed to find serious safety and security defects such as memory errors, API misuse, and concurrency issues; however, it is also perfectly capable of being used for the latter, including the most popular coding standard, MISRA C 2012.

When developers are required to adhere to coding standards, they look for a tool that can help them find violations. One of the metrics they use to compare tools is the coverage: the proportion of rules that the tool claims to check, with a naïve strategy being to choose the tool that claims the highest coverage of the standard.

Unfortunately, the notion of coverage is not well-defined, and because there is no reliable source of information that can be used to compare coverage between tools, customers must trust vendors to reasonably interpret the term, and to report their coverage fairly. Unfortunately, some vendors unscrupulously exaggerate coverage in order to gain a competitive advantage, and in doing so also confuse consumers.

In this article, I shed some light on why coverage is a slippery notion, which I hope will in turn help customers make informed decisions about which tool to select.

Some Coverage is Easy

Some rules are so simple that it is straightforward to write a checker that can find all violations with no false positives. For such rules, coverage is easy—the tool can either find violations or not, with no ground in-between. In MISRA C 2012, such rules are labeled Decidable. If the violation can be detected by looking only at one compilation unit, the rule is also labeled Single Translation Unit. For example, rule 4.2 forbids the use of Trigraphs, and is labeled this way. If a tool claims to have coverage of this category of rules, then it is perfectly reasonable to believe that claim.

Things get a little murkier, however, if a violation can only be reliably found if the tool needs to look at multiple compilation units at the same time (these are labeled System in the standard). For example, MISRA C 2012 rule 5.1: “External identifiers should be distinct” is certainly decidable, but the only way a tool can reliably find a violation is if it examines all compilation units and compares all such identifiers found in each.

If a tool claims to have full coverage of a rule with System scope, then it is only reasonable to believe that the tool is also capable of finding all the compilation units that contribute to the program. Over approximation and under approximation of the set can lead to both false positives and false negatives. Humans routinely get this wrong, so a user of a tool that does not offer an automatic way to determine the set, is running the risk of getting incorrect results.

Automatic techniques are surprisingly tricky to get right. The most effective approach is one that integrates tightly with the build system, as that is most often the most trustworthy source.

Undecidability

In MISRA C 2012, some rules are labeled “Undecidable,” meaning that it is fundamentally impossible to have a method that can, in general, say for sure if a violation is present or not. Because of this property, the author of a checker must find a sweet spot that balances the risk of false positives with the risk of false negatives. Most of these rules require an analysis that is capable of reasoning about the execution of the program, so only the most sophisticated SAST tools can be expected to do a good job. A good example is MISRA C 2012 rule 17.2, which outlaws recursion, both direct and indirect (i.e., calls through function pointers).

The trouble is, claims of coverage are often made that ignore whether a tool is good or bad at finding violations. If a tool can only find the most obvious and superficial instances of violations, is it reasonable for it to claim that it has coverage of that rule? The other side of that coin is interesting to consider too — if a tool finds all violations, but also reports so many false positives that is impractical to inspect them all, is it fair to say that it has coverage?

Coverage Breadth

The final aspect of rule coverage that makes it complicated is that coding standards are usually quite loosely defined, whereas SAST tools must have a precise definition of the properties that they are looking for. Consequently, it is common for a checker to detect a property that is either a superset or a subset of what the rule requires.

For example, let’s consider CodeSonar’s coverage of MISRA C 2012 Rule 2.2: “There shall be no dead code.” For the purposes of this rule, dead code is code that is executed, but whose removal cannot affect program behavior. CodeSonar has an Unused Value checker, which finds places where a variable is assigned a value that is never subsequently used. All such places violate the MISRA rule, but there are other ways in which the rule can be violated that are not detected by this checker. Thus, the Unused Value checker covers only a subset of what the rule specifies, and other CodeSonar checkers fill in the gaps.

In some cases, the rule and the checker are not in a strict subset/superset relationship. They may overlap a lot or a little, or the checker may detect a property that is not a direct violation but is very likely to lead to a violation of the rule.

In CodeSonar, our policy is to claim coverage only if there is a large overlap between what the rule specifies and what our checker will find, and where the checker does not yield warnings that would be reasonably judged to be false positives for that rule (notwithstanding that they may be true positives otherwise).

One Rule to Ring them All

There is one MISRA C 2012 rule in particular for which this issue is acute. Alarm bells should go off if you see a SAST tool (particularly one of the superficial tools) claim coverage of MISRA C 2012 Rule 1.3: “There shall be no occurrence of undefined or critical unspecified behavior.” This rule is so broad that it needs an additional 10-page Appendix that lists some of the specific things to avoid. This in turn references the C standards: those for C90/99 enumerate 230 instances of undefined behavior (65 of these are not covered by any other MISRA rule), and 51 instances of critical unspecified behavior (of which 17 are not covered by any other MISRA rule). Furthermore, the rationale for Directive 4.1 adds: “the presence of a run-time error indicates a violation of Rule 1.3.”

Consequently, Rule 1.3 specifies an enormous amount of forbidden behavior, including null pointer dereferences, buffer overruns, use of uninitialized memory, data races, use after free errors, and many of the other hazards of programming in C.

The problems with rule coverage claims should be clear — although this one rule (of the 143 in the standard) constitutes only 0.7% of the standard, it covers maybe 50% of the truly unpleasant kinds of failures that can befall a C program.

Furthermore, if a tool is to claim coverage of Rule 1.3, it should have checkers that have a good-sized intersection with all of those undesirable behaviors. If a tool can only find a tiny percentage of them, then it is unreasonable for it to claim coverage.

Many SAST tools that claim to find violations of MISRA rules are fairly superficial tools (such as those in the Lint family), and as such, they are very weak at finding violations of 1.3, even though they claim coverage. In contrast, advanced SAST tools such as CodeSonar are explicitly designed to find the kinds of run-time errors that constitute violations of Rule 1.3. Analyses that make it possible for them to find such defects with reasonable precision must be whole-program, path-sensitive, aware of hazardous information-flows, and capable of reasoning about concurrently-executing threads.

Evaluating MISRA Coverage Claims

In conclusion, let me summarize the most important points:

  1. There is no widely-accepted good definition of MISRA coverage, even for rules that are decidable.
  2. It is fundamentally impossible to have a perfect checker for a rule that is undecidable. False positives and false negatives for these are unavoidable.
  3. Checkers and rules do not always intersect perfectly.
  4. Claims of high coverage of MISRA rules should not be taken at face value. Vendors have penalty-free incentive to exaggerate.
  5. MISRA C 2012 Rule 1.3 encompasses a vast amount of behavior to be avoided. Only sophisticated SAST tools whose primary purpose is to find such bugs are good at finding violations of this rule.

When making a decision on which tool to use, the questions one must ask are: “What are the real problems in my code? Are they just coding standard violations, or real bugs? How well does this tool really find the problems I need to find?” The best way to answer this question is to try the tool on your own code, and evaluate the results rationally.

To learn more, we welcome you to download a read this white paper, “Accelerating MISRA Automotive Safety Compliance with Static Application Security Testing.”

Other Posts

Check out all other blog posts and stay informed.

view all posts