The Myth of Portability

For many open-source software projects, C is the programming language of choice. While there are occasionally good reasons for choosing C over another, higher-level programming language, many projects seem to use C because it is supposedly more “portable” than other programming languages. The portability of C programs is often cited as one of its virtues as a development target; however, I will make a case that, upon closer inspection, C is actually not a very portable programming language at all.

For this article, I will follow Wikipedia in defining software portability as, “[the ability] to reuse the existing code instead of creating new code when moving software from [one] environment to another” (bracketed editions mine). In other words, a program is portable if you can run the same code without modification on a variety of different hardware and operating system platforms.

On the surface, C programs appear to be extremely portable. C compilers are available for practically any combination of hardware and operating system you can imagine, and some of these are of extremely good quality (e.g., GCC and LLVM, to name two). Thanks to tools like autoconf, it seems possible to compile and install many large software packages written in C without having to modify a single line of source code. Upon closer inspection, however, you will see that these tools give only the illusion of portability; under the covers, you are in fact compiling different code depending upon details of your platform. While it’s true that you are not modifying the code yourself, modifications—and sometimes substantial ones—are taking place quietly behind the scenes.

Compiling and installing a typical C program from source involves typing (or having a package management program type for you) commands similar to these:

# Downlod the package
% wget http://package.location.com/path/to/tarball.tar.gz
 
# Unpack the source
% tar -xzf tarball.tar.gz
 
# Configure
% cd tarball/
% ./configure
 
# Compile
% make
 
# Install
% make install

In a typical package, configure is a shell script is generated by autoconf that roots around in your system to figure out which variations of the code should actually be compiled. This is necessary for a couple of reasons—for one, because different platforms may assign different meanings to a given C expression. For a trivial example, consider the following C declarations:

int x;
long y;

According to the ANSI C standard,* you can be sure that x can accept any integer value in the closed interval [-32767, 32767], and that y can hold any value x can, but you cannot be certain of the exact sizes of these two variables from their declarations alone. A program that cares about the exact sizes and value ranges for these data types must therefore be modified to suit the local requirements. Typically, this is done using the C preprocessor: The configure script may spit out a config.h file containing, say,

typedef int my_int_type;
typedef long my_long_type;

* By which I mean ISO/IEC 9899:1990. The C99 standard helps fix trivial examples like this by including the inttypes.h header, which defines size-aware types like uint8_t and int32_t. For programs written before C99 was widely available, and for compilers that do not support C99, you still need to patch. Both are common.

The rest of the program uses my_int_type and my_long_type instead of int and long, and obtains the above definitions via the preprocessor directive

#include "config.h"

In short: You are compiling different code depending upon how your platform defines its basic types. The fact that the preprocessor types in the changes instead of the end-user is irrelevant; the important truth is that the input to the compiler is different.

This problem is not limited to type declarations, either! Except for the basic file I/O operations provided by the C standard library, C programs must also cope with differences in the operating system interface in order to access the network, draw on the screen, play sounds, create threads, and a host of other interesting tasks. POSIX helps a bit, by providing (somewhat) standard interfaces for other system services, but in my experience, the degree of POSIX support is inconsistent across systems. Many platforms provide most of the POSIX.1 Core Services suite, but there are plenty of compatibility problems. As a result, configuration scripts usually wind up generating definitions like this:

#define HAS_FEATURE_X
#define NO_FEATURE_Y
#define FEATURE_Z_VERSION 2

This seems innocent enough, until you look at what happens in the rest of the program: Throughout the code, you will fine sections like this:

#ifdef HAS_FEATURE_X
  /* (A) ... some code using feature X ... */
#else
  /* (B) ... some gross workaround using other features ... */
#endif

Once again, we see that the compiler is given different code, based on platform differences. You might think you’re compiling the same file on different systems, but you’re not! On systems where Feature X is supported, the compiler gets section (A) of the code, and elsewhere it gets section (B). You might as well just have two different copies of the file, one with (A) and the other with (B). Sure, the version with the preprocessor takes up less space on disk, but the outcome is the same: You’re compiling different code on systems with Feature X than on systems without it.

You might think this distinction doesn’t matter. After all, if I can compile the program without having to make changes myself, does it really matter if the code is the same? I argue that it does matter, for the following reasons:

  1. Finding and fixing bugs is much more difficult in code that varies by platform. A test-case that manifests the bug on one platform may not reveal any errors on another. Since distributed development is very common in open-source projects, this is a real issue. If users can’t easily report bugs, or developers can’t easily reproduce the bugs users have reported, the project may suffer.
  2. Auditing code for security vulnerabilities is virtually impossible. Does the program contain buffer overflows, leak sensitive data, maintain security invariants, and so forth? The only way to know is to generate every possible variation of the complete program, and conduct an audit separately. Automation of this process is blocked, because some of the code will only compile on some platforms; so even if you have a tool that can find problems, where do you run it?
  3. Maintenance can easily cause variations to get out of sync. You wanted to add a feature to your program, but it requires editing different blocks of code for different platforms. Can you be sure that both are updated correctly, and in a timely fashion? Are your bug-fixes applied consistently across all the platform variations your program includes? Answering these questions takes up a great deal of time and energy within many open-source development projects.

It would be reasonable to conclude, from the above examples, that the real portability problem isn’t caused by C, but by the C preprocessor. Indeed, if you were to write a C program without using the preprocessor at all, the odds are good that it would be quite portable. Such a program would also be virtually useless, however, since even the most basic features of the standard C library (such as file I/O) require inclusion of library headers, whose exact definition may (and does) vary by platform. Even among different installations of a single compiler such as GCC, the contents of headers changes across operating systems. More importantly, the C preprocessor is part of the language; it’s specified (though mostly by example) in the ANSI C standard, and is an essential part of the C translation process. You can’t actually have C without it, and all the attendant portability issues it causes.

If you’ve read this far, I hope you are now alive to the idea that C programs, despite appearances, are rarely ever actually “portable” in the sense that they run “without modification in multiple environments.” I do not intend, by this, to argue that you shouldn’t program in C; there are some problem domains for which it’s a very good choice. However, if you are planning to develop in C because you believed it was “portable”, I hope that you will now reconsider your thinking on the matter. The unportability of C programs is not obvious, but it is an important truth.

So how can we fix it? Well, the easy solution for new projects is to pick a different language. I have opinions on that too, but it’s a matter for another post. Actually “fixing” C would be an enormously difficult undertaking; that said, I think most of the portability problems with C arise because the language specifies too little: The sizes and formats of basic types, structures, unions, and pointer semantics are all underspecified. There is no module system, no way to resolve namespace conflicts except by rewriting code. Preprocessor conditionals and macro expansion do not respect syntactic or semantic boundaries. The standard library is extremely primitive. There is no standard ABI, linkage format, or function call protocol (although there are some pretty stable conventions). The interpretation of declaration keywords like const, register, volatile, and restrict is so loose that you cannot predict what the compiler will do with them. The portability of C would be greatly improved if the standard simply became more opinionated; if, as in Java, the sizes and formats of the primitive data types were clearly defined, the layout, packing, and alignment of structures specified, the preprocessor greatly restricted, and the rules for implicit type conversions simplified. Doing this now, however, would disrupt decades worth of code that was written to work around a poor standard, and would probably cause more troubles than it would solve.

Nevertheless, the take-away message here is that a program whose compiler sees different code, depending upon the environment in which it runs, is not portable. And, if you feel—as I do—that portability is a virtue worth consideration, this should concern you.

Games with Names

Suppose you wanted to inject a new function into a Python module, based on the value of an existing variable in that module. If that statement made no sense to you, or if you can imagine no reason why you might want to do such a thing, then you can safely skip the remainder of this post. Suffice it to say, then, that you want to do this thing: How might you go about it?

My first attempt looked something like this:

m = __import__("modulename")
 
expr = getattr(m, 'SPECIAL_VALUE')
def new_func(x):
  return stuff_involving(x, expr)
 
setattr(m, 'new_func', new_func)

This seems to be fine; the definition of new_func closes over the binding of expr visible at the time of its definition, and all is well. Unfortunately, a problem arises when you attempt to move this code into a loop, so that you can apply the same operation to a bunch of modules, e.g.,

for mod_name in module_list:
  m = __import__(mod_name)
 
  expr = getattr(m, 'SPECIAL_VALUE')
  def new_func(x):
    return stuff_involving(x, expr)
 
  setattr(m, 'new_func', new_func)

Now, the problem here is that each time the new_func definition is encountered in the loop, it captures the same binding of expr that all its predecessors had. The value changes from one iteration to the next, but the binding remains the same, with the result that all of the closures wind up with whatever value is left in that binding at the end of the loop. In general, this is not what you want?

So, how can you get around this? You could probably reach inside the function object and manually twiddle its environment, but I do not know of any reliable way to do that in Python, and besides, I consider that kind of behaviour to be morally dubious. In the end, the easiest solution I could find was to force each iteration to create a new binding for expr, by wrapping the definition inside another closure, e.g.,

for mod_name in module_list:
  m = __import__(mod_name)
 
  def t():
    expr = getattr(m, 'SPECIAL_VALUE') # A fresh binding each time
 
    def new_func(x):
      return stuff_involving(x, expr)
 
    return new_func
 
  setattr(m, 'new_func', t())

This now works as intended. Solutions like this fall under the category of “sleazy, yet effective”. The only thing that is really objectionable here is that there isn’t any way to make a first-class anonymous function in expression position. Python’s lambda would work in this simplified example, but if the body of new_func isn’t just a single expression, that won’t help. As far as I can tell, this is about the simplest solution to this problem, given that we assume the body of new_func doesn’t need to access other globals from inside the namespace of the target module.

If anyone knows a simpler way to do this, though, I’m all ears!

Executive Privilege

There has been a lot of foofaraw in the news lately, because U. S. President Obama has expressed his wish to broadcast a televised address to our nation’s schoolchildren. Apparently, some parents have objected most vehemently to having their school districts showing this address to their children, and apparently some school administrators have said they simply won’t show the broadcast in their schools, or will leave the choice about whether to do so to individual teachers.

In my view, no citizen of the United States of America, irrespective of age or relationship, has the right to deprive any other citizen of the chance to hear any public message delivered by the duly-elected Executive Officer of this country. That includes parents and their children: No parent has the right to refuse her children the right to see the President and hear what he wishes to say. Neither adults nor children should be forced to listen to the President’s words; however, no citizen, regardless of age, may justly be deprived of the opportunity to hear what he has to say. The President is the properly chosen head of our Federal government; as such, it is right that he should be able to address the public at any time, for any reason, without interference.

Any school that is funded, in whole or in part, by Federal tax dollars, should take it as a civic responsibility to insure that President Obama’s message is made available to any student who wishes to hear it. Any school that gets tax breaks because of its educational mission, or that benefits from government subsidies for financial aid, should do the same. Time should be set aside from the normal schedule, so that no student will have conflicting obligations, and the full broadcast should be presented, without comment, to any student who wishes to attend. No student should be required to attend the presentation, but none should be prohibited from doing so either. This is not a matter in which local bureaucrats have the moral authority to refuse. Nor, in my view, do parents.

As private citizens, we have a great many rights and freedoms, and our President is not a King. Neither, however, are we an anarchic mob, in which every woman and man may do as they see fit without reason or limitation. He may have been elected from among the people, and be addressed no more formally than as “Mr. President”, but during his tenure in office, the President is not just another citizen. We have entrusted him with enormous power and responsibility, above and beyond our own. As individuals we need not obey his orders, except under very specific conditions, but it is our moral, civic, and sometimes legal responsibility to listen to what he has to say. We need not agree with it, and we need not comply, but we ought always to listen. Even if the things he says offend us, frighten us, or outrage us, we must listen. The time to make choices was in the voting booth. Now, we must abide with the government we have, even if we don’t like it.

Parents are legally responsible for their children, but children are not chattels. If you forbid your children from listening to Mr. Obama’s speech, you are violating another citizen’s rights, plain and simple. Children are not the equals of adults in matters of responsibility, but neither are they mindless possessions. If you disagree with the President, the correct response is not to lock your children in the basement. Instead, listen to the speech together with your children, and afterwards you can tell them why you disagree. His speech will not be private; it may contain special messages for schoolchildren, but it will be heard by all. Feel free to rant and rave, call him names, and castigate his policies from stem to stern—but for the love of god, do not presume to stop your children’s ears against his words.

According to Alan Silverlieb at CNN, “The White House said the address, set for Tuesday, and accompanying suggested lesson plans are simply meant to encourage students to study hard and stay in school.” It would be utterly moronic to refuse a schoolchild the right to hear that message. But even if you suppose his real plan is to say something completely different, you have no right to interfere.

Next Page »