Undefined Behaviors in ISO-C: Their Effect on Your Embedded Software Part 2

Feb­ru­ary 22, 2018

In Part 1 of this blog, I defined what I mean by unde­fined behav­iors and pro­vid­ed some exam­ples. I also showed how dif­fi­cult it is to man­age some of these behav­iors and tick­led your inter­est by hint­ing that there may be a pos­i­tive use to these behav­iors. Let’s con­tin­ue!

How C compilers (ab)use undefined behavior

The pur­pose of an opti­miz­ing com­pil­er is to min­i­mize the time need­ed to exe­cute the pro­gram and to min­i­mize the size of the exe­cutable code, with­in the con­straints imposed by the ISO-C lan­guage spec­i­fi­ca­tion.

Most opti­miz­ing com­pil­ers will remove the fol­low­ing code frag­ment.

for ( int i=0; i<N; i++ );

In the (far) past such con­structs were used to cre­ate a delay in pro­gram exe­cu­tion. Today most com­pil­ers detect that this loop does not affect the observ­able state of the pro­gram and will not emit object code for the loop.

It is easy to trigger undefined behavior

The results print­ed by the fol­low­ing pro­gram may dif­fer when com­piled with dif­fer­ent com­pil­ers or with the same com­pil­er using dif­fer­ent opti­miza­tion set­tings.

int main () {

   int arr [] = { 0, 2, 4, 6, 8 };

   int i = 1;

   printf (“%d\n”, i + arr[++i] + arr[i++]); // eval­u­a­tion order is unde­fined

}

Although the results are dif­fer­ent none is incor­rect. This is due to ISO/IEC 9899:2011 para­graph 6.5p2 (Unfor­tu­nate­ly the offi­cial ISO/IEC 9899:2011 stan­dard doc­u­ment is not free down­load­able. Use this link to access a draft stan­dard. Para­graph num­bers and text are often the same.), which spec­i­fies that unde­fined behav­ior occurs if there are mul­ti­ple allow­able order­ings of the subex­pres­sions of an expres­sion. There­fore in above exam­ple the value of “i” is like­ly (The behav­ior is unde­fined so other val­ues are also pos­si­ble.) either 1, 3 or 2, depend­ing on whether the com­pil­er eval­u­ates the expres­sion from left to right, right to left, or in anoth­er way. The value of “arr[++i]” is like­ly either 4, 6 or 8, and the value of “arr[i++]” is like­ly either 2, 4, or 6, and the sum is derived from an arbi­trary com­bi­na­tion of these val­ues.

Compilers may exploit undefined signed integer overflow behavior

Con­sid­er the fol­low­ing code frag­ment.

int foo(unsigned char x) {    // 1

    int val = 10;         // 2

    val += x;            // 3

    if (val < 10)        // 4

           bar();        // 5

    return val;         // 6

}                    // 7

Some com­pil­ers will opti­mize away the tests “if (val < 10)” and the call to func­tion “bar()”. A com­pil­er is per­mit­ted to per­form this opti­miza­tion since ISO/IEC 9899:2011 para­graph 6.5.p5 spec­i­fies: “If the result of an expres­sion is not in the range of rep­re­sentable val­ues for its type, the behav­ior is unde­fined.”

The value of “x” can­not be neg­a­tive and, given that signed inte­ger over­flow is unde­fined behav­ior, the com­pil­er can assume that at line 4 the value of “val” is always equal or greater than 10. Thus the “if” state­ment and the call to the func­tion “bar()” can be ignored by the com­pil­er since the “if” has no side effects and its con­di­tion will never be sat­is­fied. The opti­miza­tion as per­formed in this exam­ple seems harm­less, but it can have unde­sired side effects as shown in a next exam­ple.

Safe and unsafe arithmetic overflow occurrence test

Arith­metic over­flow checks for signed addi­tion can be imple­ment­ed in sev­er­al ways. A safe way is to com­pare against a known thresh­old by writ­ing some­thing like “if (a > INT_MAX – 100) OVERFLOW;”.

How­ev­er “expe­ri­enced” C pro­gram­mers may take 2-com­ple­ments over­flow for grant­ed and test whether the sum of the num­bers is small­er than one of the addend by writ­ing “if(a + 100 < a) OVERFLOW;”.

In the lat­ter case an opti­miz­ing com­pil­er may assume that the if state­ment is always false and emit no object code. A com­pil­er is per­mit­ted to per­form this opti­miza­tion because the value of “a+100” is always equal or greater than “a” unless an over­flow occurs in which case the behav­ior is unde­fined and any­thing is allowed to hap­pen, includ­ing ignor­ing this case.

C compilers may silently discard some wraparound checks

This exam­ple is taken from the CERT Vul­ner­a­bil­i­ty Notes Data­base. Some C com­pil­ers will opti­mize away the point­er arith­metic over­flow tests from the fol­low­ing code frag­ment with­out pro­vid­ing a prop­er diag­nos­tic mes­sage. These com­pil­ers assume that the value of “buf+len” is always equal or greater than the value of  “buf”.

char *buf;

int   len;

[…]

len = 1«30;

[…]

if(buf+len < buf)  /* wrap check */

[…over­flow occurred…]

A com­pil­er is per­mit­ted to per­form the opti­miza­tion since ISO/IEC 9899:2011 para­graph 6.5.6p8  spec­i­fies: “When an expres­sion that has inte­ger type is added to or sub­tract­ed from a point­er, …, the eval­u­a­tion shall not pro­duce an over­flow; oth­er­wise, the behav­ior is unde­fined.”

Wrap­ping checks that use meth­ods sim­i­lar to the one described above depend on unde­fined behav­ior and may be vul­ner­a­ble to buffer over­flows. How­ev­er the opti­miza­tion applied by the com­pil­er is not the root cause of the prob­lem. A com­pil­er designed for safe­ty crit­i­cal soft­ware develop­ment shall emit a diag­nos­tic mes­sage in such cases and warn the pro­gram­mer about poten­tial side effects of the applied opti­miza­tion.

After such warn­ing the pro­gram­mer can (eas­i­ly) solve the prob­lem by cast­ing the objects of type “char *” to “uintptr_t” before com­par­i­son. The faulty wrap­ping check list­ed above should be rewrit­ten as:

#include <stdint.h>

[…]

if((uintptr_t)buf+len < (uintptr_t)buf)

[…]

Notice that only signed over­flow is unde­fined, there­fore it is advised to use unsigned inte­gers to track buffer sizes.

C compilers may silently discard some null pointer checks

Some C com­pil­ers will opti­mize away the NULL point­er check at line 3 from the fol­low­ing code frag­ment with­out pro­vid­ing a prop­er diag­nos­tic mes­sage.

int foo(struct bar *s ) {           // 1

int x = s->f;            // 2

if (!s) return ERROR        // 3

[… use s …]            // 4

}

A com­pil­er is per­mit­ted to per­form this opti­miza­tion. At line 2 the point­er “s” is deref­er­enced, and unde­fined behav­ior occurs if the value of point­er is NULL, there­fore the com­pil­er can assume that “s” is not NULL; Under this assump­tion the if state­ment at line 3 is always false and the com­pil­er will not emit­ted object code for line 3 and remove the code that was sup­posed to detect and pre­vent unde­fined behav­ior.

Also in this case the opti­miza­tion applied by the com­pil­er is not the root cause of the prob­lem. A com­pil­er designed for safe­ty crit­i­cal soft­ware develop­ment shall emit a diag­nos­tic mes­sage in such cases and warn the pro­gram­mer about poten­tial side effects of the applied opti­miza­tion.

After such warn­ing the pro­gram­mer can solve the prob­lem by mov­ing the NULL point­er check before the point­er deref­er­ence. The faulty check list­ed above should be rewrit­ten as:

int foo(struct bar *s ) {           // 1

int x;                // 2

if (!s) return ERROR        // 3

x = s->f;                // 4

[…]                    // 5

}

Other undefined behaviors that may lead to removal of code fragments

Shift­ing a n-bit inte­ger by n or more bits caus­es unde­fined behav­ior. Some com­pil­ers will assume that the shift amount is at most n-1, and use that infor­ma­tion to opti­mize the code.

Out of bound array access caus­es unde­fined behav­ior. Some com­pil­ers assume that a vari­able that is used as array index will never exceed the array size, and use that infor­ma­tion to opti­mize the code.

Such opti­miza­tions are allowed, how­ev­er prop­er diag­nos­tic mes­sages are high­ly desired to enable the pro­gram­mer to ver­i­fy if the result­ing code behaves as intend­ed.

If you want to read more about the topic of this blog then this sci­en­tif­ic arti­cle What every com­pil­er writer should know about pro­gram­mers -- or -- “Opti­miza­tion” based on unde­fined behav­iour hurts per­for­mance could be of inter­est to you.

Conclusions

Unde­fined behav­ior rep­re­sents a dan­ger­ous trade off: it sac­ri­fices pro­gram cor­rect­ness in favor of per­for­mance and com­pil­er sim­plic­i­ty. In essence, unde­fined behav­ior cre­ates a sep­a­ra­tion between unsafe oper­a­tions and error checks. Due to this sep­a­ra­tion the checks can be opti­mized, which can result in a par­tial or com­plete elim­i­na­tion of the check.

Com­pil­er devel­op­ers typ­i­cal­ly take pride in imple­ment­ing advanced opti­miza­tions and take a legal­is­tic approach when inter­pret­ing the ISO-C stan­dard to improve bench­mark results. This can lead to sit­u­a­tions where the behav­ior of the opti­mized pro­gram does not cor­re­spond with the inten­tions of the pro­gram­mer.

Some peo­ple argue that a com­pil­er should pre­serve the intent of the code. But the com­pil­er does not, and should not, spec­u­late about pre­sumed intent. The cor­rect­ness of a com­pil­er should be sole­ly judged by how close­ly it con­forms to the stan­dard. Other qual­i­ty attrib­ut­es such as the speed and size of the exe­cutable code affect the eco­nom­ic com­pet­i­tive­ness of the com­pil­er, and whether or not a com­pil­er is suit­ed for the develop­ment of safe­ty crit­i­cal soft­ware large­ly depends on qual­i­ty of the diag­nos­tic mes­sages.

Com­pil­er devel­op­ers shall care­ful­ly bal­ance the diverse require­ments of all stake­hold­ers and not exploit the weak­ness­es in the ISO-C lan­guage def­i­n­i­tion in their attempt to increase bench­mark scores. (This should not be read as an excuse for bad bench­mark per­for­mance. TASKING com­pil­ers obtain best in class scores on code speed as well as code size, while keep­ing your code safe.)  

MISRA and CERT have defined addi­tion­al rules for safe and secure cod­ing in the C pro­gram­ming lan­guage. Opti­miz­ing com­pil­ers that have built-in checks to detect vio­la­tions of these rules warn the pro­gram­mer when opti­miza­tions are applied that may change the intent of the pro­gram, and are well suit­ed for the develop­ment of safe­ty crit­i­cal code.

Scroll to Top