Testing is Not Verification

When I’m teaching a programming class, my students often hear me ask the question, “How do you know your program works?” I think many of them learn to dread that question, like when the dentist asks if you’ve been flossing. There are a number of ways to answer this question, but most of the responses I get when I ask it are more or less equivalent to, “Well, you know, you try it out, and you see what happens—and if it seems to do the right thing for all the cases you can think of, then you figure it is probably all right.” In other words, you test it; a nice, simple, practical solution.

No matter what kind of problem you are trying to solve, testing is an excellent idea. However, Jon Bentley quotes Edsger Dijskstra as having said:

“Testing can show the presence of bugs, but not their absence.”

A moment’s consideration should convince you that Professor Dijkstra was correct—if you test your solution and it fails, surely there is a problem. A test that passes, however, says at most, “the problem is not here.” To truly verify that a computer program will always give the correct answer is a much more complicated process than simply testing it. Nevertheless, it seems to me that many programmers equate “testing” with “verification” when speaking about the correctness of their programs. This equation is false, however, and that the assumption of its truth is behind a large proportion of the bugs found in software in the wild.

A couple of months ago, I was talking with David Aucsmith from Microsoft’s Security unit, who made an interesting argument: He says that if we want to be able to write software that is secure and bug-free, then throughout the development process, we need to keep in mind that we have an adversary. Not just one adversary, in fact, but many. An adversary is someone or something that acts contrary to your best interest — at best, an unsympathetic user; at worst, a malicious programmer with mischief on his mind. As I see it, a programmer’s adversaries fall into three general categories: The problem, the users, and anybody else who wants something from the users.

The first adversary is the problem itself. You want the computer to do some work for you, and you need to instruct it. This is what we usually think of as programming—identifying problems, choosing algorithms, and implementing them. This process can be challenging, but that’s the meat and drink of software development. But now, let’s take the adversarial view, by personifying the problem: Let’s say that the reason development is challenging is that the problem does not want to be solved. This sounds like a silly idea, but stick with me for a moment: A problem that doesn’t want to be solved is swift and clever, but not very strong—it won’t attack you, but it will twist and hide. It will resist easy definition, and throw up straw-men that expose only its simplest, easiest cases, to lead you down the garden path. It will lure you toward definitions that are clear, simple, and wrong, and tantalize you with premature optimizations.

When knowingly faced with such a problem, you have to be prepared to ask yourself, “How can I be sure I really solved this problem?” After all, the problem is trying to avoid you—did you really get the hard cases right? Were your assumptions all sound? Did you leave anything out? If you can’t answer those questions to your own satisfaction, and that of your employer and your customer, you’d better go back to the drawing board.

The second adversary is the user of your program. The user wants the computer to process some data, and your program purports to do that for her. She doesn’t want to think about the details of algorithms and implementation; she simply wants to say, “Here are my data, here’s my question…go!” She doesn’t hate you, per se, but she has no desire to spend hours learning about XML templates and query languages; ideally, it would be enough to put your program and her data in the same location with a couple of drinks and a bag of pretzels, and let them figure out the details on their own. She knows that can’t happen, but she wants maximal results for minimal complexity. She also expects the program to be forgiving about the format of its inputs, and to give her the chance to back up and correct mistakes before doing anything irrevocable. She’ll give your program bad data, abuse the user interface in ways you never intended, and practically never read more than about a page of the documentation you so carefully wrote. You do write user documentation, don’t you? Well, you’d better start. More importantly, however, she’ll use your program to solve problems whose implications you never considered, and even if you wrote a careful disclaimer to protect yourself from legal liability, it’s possible you could be morally liable if something really bad happens.

Defending against users has to begin at the earliest stages of the design process. You can’t simply solve the underlying problem, then go back and slap together an interface for it (although we’re all guilty of doing so); you need to give the user a mental model for how your program works, so that she will be able to figure out how to use it without breaking it.*

* It has been said that applying technology is the art of finding the appropriate wrench to use to pound in the correct screw.

The third adversary is anybody who wants something from the user of your program—and I do mean anybody. If your software gets popular enough, even advertisers might want something from it. But what I’m really talking about here is the malicious species of hacker, who wants your user to give him control of her computer so he can work his wicked will. He is looking for the flaws in your assumptions, the errors in your design, and the bugs in your code. He’s going to read your source code, or, if the source isn’t available, he’ll disassemble your object code and reconstruct your algorithms that way. For this adversary, it is no longer enough for you to simply state your assumptions, you will also have to check them. If you don’t, he will find a chink in your program’s armor, and use it to get inside somebody else’s computer—or maybe even your own. At a minimum, he might want to embarrass you; at worst, your software might turn out to be the back door into World War III.

So, in short, you have at least three adversaries: A problem that doesn’t want to be solved, a user who will use the solution in unexpected ways, and an attacker who is going to pick apart your solution looking for ways to break it. This doesn’t sound like programming, it sounds like politics! And this doesn’t even begin to cover attorneys, who are professional adversaries for just about everybody (including each other), and who will happily chew your clothes off in court if you give them a reasonable excuse. The point I’m trying to make here is that we can’t afford to write programs as if everybody out there loves us and wants us to be happy, because not everybody does. This is not a viewpoint we customarily teach our students, although you could argue that “adversarial programming” is basically just a stronger take on “defensive programming,” which we do supposedly teach.

At the universities, we do talk a bit about defensive programming, but rarely ever really require our students to apply it in their classwork. In part, this is because education is supposed to be a cooperative effort, not a competitive one. One unfortunate result of this neglect, however, is that university graduates in computer science leave with plenty of knowledge about data structures, algorithms, architecture, networks, and theory, but very little (if any) practise at programming defensively. Typical example programs in a computer science course make unstated assumptions about input values, use I/O without checking for errors, make algorithmic simplifications that destroy the generality of the solution, and almost never come with a clear and precise description of what it means for the solution to be correct.

So what makes a program “correct?” There are two important features: It must do all the things it is supposed to do, and it must do none of the things it is not supposed to do. Proving that a complicated piece of software is truly correct is not at all easy. So, how do you know your program is right? In practise, you test it. But testing is not the same as verifying its correctness. You test by creating test cases: You run your program with carefully-chosen inputs, and examine the program’s outputs. If the program produces the expected results (or behaviour), you would say the program passes the test; if not, the program fails the test.

Generating good test cases is a dark art. Simple cases can often be generated by hand—simple inputs or workflows whose results can be easily predicted. But the point of using a computer is so you can solve big, hard problems, whose answers are not easily obtained by manual labour—so we often build simple test cases by hand, and simply hope that the more complicated cases will be okay. Another common approach is differential testing, which basically means you run a fixed set of test cases with several different programs purporting to solve the same problem, and compare their answers. If they all agree, you consider it a good result; otherwise, you take a majority consensus (if any) as the “correct” result. But all of this sidesteps the basic problem, which is how to know you’ve chosen good test cases in the first place.

There are a few ground rules that should apply to any testing scenario:

  1. Vary all the input parameters. If possible, choose tests that vary each input parameter independently of the others. This can easily make the combinatorics of testing grow out of hand, but you should isolate as much as possible.
  2. Execute every path. Every possible path through your code should be tested; if a conditional has n branches, that means you need n tests targeting that conditional.
  3. Generate every error. Every possible error condition your program generates should be tested. This may require a test framework in which “error” is the correct answer.
  4. Cover interesting segments of the input range. There are often a variety of different “classes” of input for a problem; you should be sure to get some test cases for each important class.
  5. Generate out-of-range inputs. Whether or not the program is designed to handle bad input, you should generate some test cases with bad input. In such cases, a “correct” program should neither crash nor give a misleading result.

This adds up to a lot of tests to write. However, if you’re serious about making sure your programs work, and you aren’t willing to invest the effort of proving your programs correct formally, this is your only real option. Plus, good test hygiene will make it easier to fix bugs you find later. Axiom: Every bug hides behind the word “should”—either the program should do something but doesn’t; or the program should not do something, yet does.

Even if you design your tests very carefully, you can get in trouble, however: I once wrote a library to perform arbitrary-precision integer arithmetic in C. To test it, I generated a large set of test cases by choosing random values and computing the “expected” outputs with GNU bc. As a primitive form of differential testing, I cross-checked them by running the same tests in DrScheme. From these, I chose tests that exercised every line of code in the library. Several months later, I had a bug report which turned out to be caused by a mistake in one of the assumptions I’d made in implementing integer division; although the tests had all passed, I hadn’t found this bug simply because it only affected numbers with a certain structure. Perhaps I ought to have recognized that this was one of the “interesting” input ranges, but I had not, and so this bug escaped all my careful tests.

Some developers advocate a somewhat more extreme approach to development, known as test-driven development (TDD). The idea behind TDD is that you design your tests before writing any code, and develop your code to meet the tests. Somewhat surprisingly, this strategy seems to work quite well in practise; it imposes a discipline of writing code that can pass interesting tests, and forces us to build programs that can be regression-tested. I think TDD is basically a good idea, so long as you program “in good faith.” As a limiting case, you could write a program that simply compares its input to the test conditions, and gives the correct response; such a program would “pass all its tests,” but would violate the spirit of TDD.

The point I’m trying to make here is that writing correct programs is difficult, and while testing might seem to be an easier way to get there than logic and proof, “there ain’t no such thing as a free lunch.” If you shortcut the process with weak testing, you’re not just getting off easy, you’re abdicating your ethical responsibilities as a programmer. In a world where software controls more and more of the critical infrastructure we depend upon, from power and banking to medical treatment, we cannot afford to ignore those responsibilities.

The Elements of Commenting Style

I have, of late, been teaching some of the early programming courses in the department of computer science—as a result of which students frequently come to me with questions like, “How should I comment my code?” The idea that code should contain comments is drilled into our heads by the teachers of introductory programming courses, who make “good commenting style” part of our grades, and deduct points if we do not do it right. But what is “right?” Alas, the elements of good commenting style have remained something of a mystery, not only to students, but also, in my experience, to a large proportion of those who are employed as programmers in a professional capacity. This suggests to me that good commenting style is not something that, like swimming, can be taught by simply tossing the hapless student into the water and hoping for the best. I will therefore dedicate the remainder of this essay to an opiniative exploration of the elements of good commenting style.

Let us begin with the following question: What is a computer program?

If you’re trying to be precise, you might say that “a computer program is a concrete description of certain actions that are to be undertaken by a computer in order to achieve a result.” More simply, however, a program is a very careful letter, written by a human, to be read and interpreted by a computer.* Unlike the letters we write for other humans to read, computer programs must be written with a painful exactitude that is difficult to capture in the fluid semantics of ordinary human language. Therefore we do not write programs in “plain language,” but instead use carefully constructed symbolic languages—programming languages—whose meanings are spelled out in precise mathematical terms, so that there can be no ambiguity.**

If our only concern were to insure that our programs are precise and unambiguous, we would write our programs in the native binary language of the computer’s processor. The first stored-program computer (the Manchester Mark 1) was programmed in this way, and that was a big improvement over having to physically change switches and move wires (cf. ENIAC). But as computers have grown, so have programs—and where they once had a handful of memory words and a card-punch for output, computers now have gigabytes of RAM and hundreds of gigabytes of storage, dozens of different input devices, high-resolution raster displays, fast networking interfaces, and a multitasking operating system. Trying to implement a complex software system on a modern computer using only binary machine language is a certain path to flaming damnation, not to mention extreme frustration. It’s easy to see that we should consider issues beyond merely precision and ambiguity when we design programming languages and write programs using them.

What’s more, significant software is rarely written by a single programmer acting alone. Even when we do work alone, we use a lot of software created by other people to get things done, both as tools (e.g., editors, compilers, build scripts) and as components of the software we build (e.g., system API’s and libraries). Moreover, we usually write our programs without the benefit of having the author of these components available for close questioning, and once written, our own software is used by other people to help solve problems even its creator may not have foreseen. For such collaborative development to work at all, we need the ability not only to write programs, but also to read programs.

From this, we may conclude that a program is not simply “a letter from a human to a computer,” but instead a kind of public essay or an open letter, potentially a work of both art and scholarship as well as a product of manufacture. Under this view, the audience for a computer program includes not only the computer for which it was originally written, but a variety of other interested parties, including:

  • Other computers. Different computers have different basic instruction sets. A program should have a sensible interpretation regardless of the basic instruction set of the computer, provided its instruction set is reasonable.
  • Other programmers. A program should be comprehensible to other programmers, so that they can use it as a component in a larger software system. Programs are also an excellent teaching vehicle for esoteric programming skills.
  • Researchers and scholars. The assumptions and inner workings of the program should be accessible to those who wish to study how the program works. A great deal of art goes into the creation of a program, and sometimes there are both brilliant insights and unintended consequences encoded there.

The need for programs to communicate with humans as well as computers is what gave rise to the idea of commenting. It’s such a simple idea that we take it completely for granted: In short, a comment is a message—written in plain language—imbedded in a computer program. Most programming languages contain some facility for including comments, and the computer generally ignores them (although occasionally programmers will use comments to convey machine-readable metadata about the program); the important idea is that the comments are kept with the program so that a human who has to figure out what the program is doing will have some input beyond the symbolic content of the code itself.

Taking a slightly more general view, you can think of a program as being a document in which machine-executable program code alternates with human-readable text—this is the whole idea behind literate programming. The code gives the work orders to the machine, and the text tells the story of the code—how it works, what it’s for, what assumptions it makes, who wrote it, where to find more information, and much more. Most programs use specially-formatted text to represent code, but there’s no reason it has to be that way; text just happens to be a particularly convenient medium for moving information from place to place, so it’s a natural fit for our collaborative idea of software development. Code represented as text also happens to integrate rather nicely with human language. As long as the computer can distinguish which text should be interpreted as code and which should be interpreted as human language, the system works out just fine.

Most programming languages operate under the assumption that a program is mostly code, with some documentation thrown in as comments to the code. Some languages reverse this relationship (PHP being a notable example), but in either case the basic idea is to have a special “marker” that indicates where code ends and comment text begins, or vice versa. In a C program, for instance, any text between the /* and */ markers is considered a comment, and is ignored by the C processor. In Pascal, the markers are { and }. Scheme uses a semi-colon (;) to mark the beginning of a comment and an end-of-line character to mark the end. PHP programs are imbedded in HTML documents by marking the beginning and end of a PHP program segment with <?php and ?> respectively. There are plenty of reasonable ways to make it work.***

The interesting question, and the one that started this whole discussion, is “What should you put in your comments?”

With the preceding discussion in mind, I would like to propose the following informal rule for deciding how, when, and where to write a comment in your code:

You should include a comment for any feature of your program that cannot be inferred by reading the text of the code. Conversely, you should not comment any feature of your program that can be inferred by reading the text of the code.

Let’s focus on the two components of this rule, with examples:

  1. “You should include a comment for any feature of your program that cannot be inferred by reading the text of the code.”

    If your code makes any unwritten assumptions about the structure of its input, you should comment those assumptions (“This procedure assumes its inputs are sorted in nondecreasing order”). If you are using a well-known algorithm, you should make it clear what algorithm it is (Ex.: “This is Dijkstra’s algorithm for single-source shortest paths,” or name the procedure descriptively). If you are using a variation on a well-known algorithm, you should clearly describe the differences between your version and the standard one (Ex.: “This version of the search algorithm chooses successor states in decreasing order of rank”). If you are using an algorithm that is not well-known, you should make it clear what your algorithm accomplishes (Ex.: “After completion of the inner loop, the value of x contains the dot product of row i and column j“). If you are using an unusual or clever encoding of data, describe it (Ex.: “The bottom three bits of the flags field give a 2’s complement offset of the vector key from the starting position”). State any postconditions that are not enforced by the type system (Ex.: “At exit, p is strictly less than q“).

  2. “You should not comment any feature of your program that can be inferred by reading the text of the code.”

    Do not describe the actions of control structures built into the language, unless you are using them in some strange way (cf. Duff’s device; Bad: “Loop over all the elements of list L.”) Do not state assertions that are enforced by the type system (Bad: “Assumes z is an integer” when z is declared or automatically inferred to be of integer type). Do not interpret badly-chosen variable names in comments—rename the variables instead (Bad: “Increment the number of cabbages that have been sent to the packing machine, denoted nc.”) Do not summarize the sequential actions of your algorithm—instead, make sure the code makes it clear what they are (Bad: “Call sort to put the list in order, then find the smallest element and print it out”).

This rule has both a positive prescription and a negative prescription. The positive prescription (1) is to include any information that the human reader will need to know in order to figure out what the code is doing, what assumptions it makes, and how it connects to a larger picture, that is not contained in the code itself. This helps the reader put the program in context, and helps them understand your mindset at the time you wrote the code. The negative prescription (2) is to leave out any information that is already explicit in the code, to reduce the risk of comments that are misleading or wrong. The code describes what the computer is actually going to do, and if you try to describe it again in a comment, there’s a significant chance you will describe it incorrectly (or at least incompletely). It’s far better to omit a comment entirely, than to write one that is factually untrue. By nature, a human reader will gravitate to the plain language first, seeking insight, and if you tell her lies, you will wind up doing more harm than good. Let her read the code, if she wants to know what the program will do.

While these rules are not sufficient in themselves to tell you how to write good comments, I have found they are a good self-check mechanism to use while programming. If you catch yourself writing “bad” comments to make up for sloppy programming, you know it’s time to refactor or rewrite the code. When you read somebody else’s code and have trouble understanding how it works, you gain some insight as to what would have made it easier to follow; you can turn such insights around and use them in your own programs. Of course, no rules as simple as these will cover every possible situation, but at least they form a basis upon which to build.

So, when somebody asks me “How should I comment my code?” this is the answer I usually give. It would be easy enough to get lost in the trivialities of “style,” and talk about spelling and capitalization, block comments vs. line-end comments, machine-readable comments, automatic generation of printable documentation, and the eternal squabbles over efficiency of processing. But as far as I am concerned, the most important features of any program documentation are that it should tell the truth—and nothing but the truth—so that the program will be comprehensible to all of its many patrons.

* The modern conceit is to assume that the word “computer” is a shorter way of saying “computing machine.” Not too long ago, “computer” was a job description, and the phrase “automatic computer,” which sounds quaint to modern ears, was a necessary semantic distinction. However, since the modern interpretation is now practically unambiguous, I will presume it henceforth.

** Or at least, so that ambiguity is minimized. If you interpret this definition strictly, some of the common languages we use to communicate with computers, such as C++ and Perl, would not count as “programming languages.” Since the point of this essay is comments, not politics, I will leave that topic for another essay.

*** There is, however, a perennial problem of how to include the segment delimiters in the middle of the segment without interpreting it as a segment delimiter. This use-mention distinction is usually handled by some sort of more or less elaborate quoting mechanism that lets the programmer say, “this next occurrence of the delimiter should be treated as regular text, not a delimiter.” Then, of course, you need a way to quote the quotation marker, but this minor problem can resolved with a small amount of care.