Undefined Behaviors in ISO‑C: Their Effect on Your Embedded Software Part 1

Feb­ru­ary 22, 2018

Caution sign

Opti­miz­ing com­pil­ers some­times give you…well, unex­pect­ed results. You have prob­a­bly seen this before, but maybe wasn’t sure what hap­pened and why.

This two-part blog is about the unde­fined behav­iors that exist in ISO‑C, the way opti­miz­ing com­pil­ers make use of it, which is often not well-under­stood by pro­gram­mers, and the unpre­dictable soft­ware bugs that result from it and fre­quent­ly show up in code that is attempt­ing secu­ri­ty checks.

Although the ISO‑C lan­guage is wide­ly used to build safe­ty relat­ed soft­ware, ISO‑C is not a “safe” pro­gram­ming lan­guage. Errors are not trapped as they hap­pen; instead, they are trapped after exe­cut­ing an erro­neous oper­a­tion. The pro­gram con­tin­ues but in a silent­ly-faulty way that may have observ­able con­se­quences later on. Fur­ther­more, the ISO‑C stan­dard spec­i­fies a long list of cir­cum­stances, called “unde­fined behav­iors”, in which no require­ments on the behav­ior of the pro­gram are imposed. Com­pil­ers are not required to diag­nose unde­fined behav­ior, and the com­piled pro­gram is not required to do any­thing mean­ing­ful, it may crash, silent­ly gen­er­ate incor­rect results, or coin­ci­den­tal­ly do exact­ly what the pro­gram­mer intend­ed.

Why does “undefined behavior” exist, and what’s good about it?

Fail­ing to explic­it­ly define the exact behav­ior of every pos­si­ble pro­gram is not an error or weak­ness in the C lan­guage spec­i­fi­ca­tion. Instead it is an impor­tant fea­ture to under­pin the under­ly­ing prin­ci­ples of the lan­guage such as: impose few con­straints on the pro­gram­mer, allow low-level access to the under­ly­ing hard­ware while retain­ing (some) porta­bil­i­ty, and enable fast pro­gram exe­cu­tion and small code-size.

By mak­ing the result of cer­tain oper­a­tions inten­tion­al­ly ambigu­ous dif­fer­ent CPU designs can be sup­port­ed with­out sac­ri­fic­ing per­for­mance. Because no spe­cif­ic behav­ior is required com­pil­ers are free to do what­ev­er is the most effi­cient for the tar­get plat­form. For exam­ple, when adding two signed inte­gers, the com­pil­er does not need to ver­i­fy and take action if the result over­flows and becomes neg­a­tive.

What’s bad about undefined behavior?

Unde­fined behav­ior, par­tic­u­lar­ly in com­bi­na­tion with opti­miz­ing com­pil­ers, also has a dark side that can cause very sub­tle bugs that can have a crit­i­cal impact on safe­ty and secu­ri­ty. Every pro­gram­mer under­stands that access­ing a null point­er or divid­ing by zero are erro­neous actions that cause unde­fined behav­ior. Writ­ing code to detect and han­dle such cases seems sim­ple, but it is not. Even very expe­ri­enced pro­gram­mers are some­times fooled by the pre­cise mean­ing of their pro­gram when a legal­is­tic inter­pre­ta­tion accord­ing to the seman­tics of the ISO‑C stan­dard is applied. Com­pil­er devel­op­ers often base their opti­miza­tions on such legal­is­tic inter­pre­ta­tion of the stan­dard. Some­times the code that should detect and han­dle unde­fined behav­iors is “mirac­u­lous­ly”, but legal­ly, opti­mized out of the exe­cutable code. I will go deep­er into this in Part 2.

Other exam­ples of unde­fined behav­ior may also be con­sid­ered as “eas­i­ly per­ceived and under­stood” such as:

  • Read­ing from unini­tial­ized vari­ables
  • Signed inte­ger over­flow (notice that the behav­ior of unsigned inte­ger over­flow is defined!)
  • Shift equal to or greater than the width of the operand
  • Mod­i­fy­ing a vari­able more than once in an expres­sion
  • Array / buffer over­flow
  • Point­er over­flow
  • Vio­lat­ing type rules
  • Mod­i­fy­ing a const vari­able
  • Negat­ing INT_MIN
  • Mod­u­lo oper­a­tion on a neg­a­tive signed inte­ger
  • Call­ing a library func­tion with­out ful­fill­ing the pre­req­ui­sites
  • Data races caused by con­flict­ing actions in dif­fer­ent threads

The list is vast, ISO-C11 spec­i­fies 203 cir­cum­stances that cause unde­fined behav­iors. Due to this large num­ber and the sub­tleties involved pro­gram­mers can­not be trust­ed to reli­ably avoid unde­fined behav­ior, which could result in pro­grams that silent­ly mis­be­have.

Fur­ther­more, mis­be­hav­ior due to unde­fined behav­ior is not easy to detect using dynam­ic tests since in most cases the unde­fined behav­ior is exposed for cer­tain inputs only. As a result code that con­tains unde­fined behav­iors may “work” for a while, and then “break” when port­ed to new hard­ware, or after upgrad­ing the com­pil­er or chang­ing its opti­miza­tion level.

What do Safety Standards say about undefined behavior?

The  unde­fined behav­ior topic is not explic­it­ly addressed by safe­ty stan­dards such as ISO 26262. Most safe­ty stan­dards refer to other (indus­try) stan­dards that pro­vide rules for safe and secure cod­ing such as MISRA‑C and CERT‑C. At the SEI CERT web­site you can find an overview of all unde­fined behav­iors includ­ing the cod­ing prac­tices that mit­i­gate the spe­cif­ic case of unde­fined behav­ior.

Today some com­pil­ers, includ­ing all TASKING C/C++ com­pil­ers, do detect vio­la­tions against the MISRA and CERT advised cod­ing prac­tices and warn the pro­gram­mer accord­ing­ly. This ensures that the inten­tions of the pro­gram­mer are retained in the com­piled pro­gram.

In Part 2, I will show how a com­pil­er can use unde­fined behav­ior to opti­mize the code and poten­tial­ly out­smart the pro­gram­mer in his ende­vour to cre­ate safe code.

Scroll to Top