read

After my last article, I had a lot of interesting conversations with different people on whether software development is a form of engineering, as well fascinating conversations on ethics in general. Not long after my last post, in fact, The Atlantic did an interesting piece covering some of the same material called Why Computer Programmers Should Stop Calling Themselves Engineers (I have no delusions that their article is whatsoever related to mine; the timing is coincidental.) The general comments on the article boiled down into a couple of broad categories:

The Goggles^W Codes; They Do Nothing!

One of the most common critiques that I read was that engineers shouldn't have a code of conduct because they don't work. Several people commented that, considering the lack of licensing (and thus the low barrier to entry), as well as the low current penetration of people who would commit themelves to a code of conduct, they would only punish those who would subscribe to them. Just like vaccines, codes of ethics only work when enough people are willing to commit themselves to it; before then, those who do can simply be dismissed and replaced with minimal disruption.

That's very true. Practically, I think these sorts of refusal would only convince some (maybe even small) set of those pushing for unethical action. As is commonly said in the physical security world, "locks are on doors only to keep honest people honest." I think there is a broader philosophical question about whether or not one should continue to refuse to engage in unethical behavior even when believing that doing so is futile— but that's a subject for someone better educated than I to speak on.

Programming Isn't Ready

Another one of the common critiques that I heard was that software is too new or underdeveloped of a field to put these kind of standards on yet. Other forms of engineering work on very well-studied principles with hundreds or thousands of years of study and discipline, but software is a relatively new phenomenon, with radical changes in our understandings and the state-of-art coming on the span of years, rather than decades or even centuries. The oldest buildings still standing are 4000 years old, while the oldest software application still running is, at most, 60 years old. (In all likelihood, this number is conservative, and the oldest extant software is closer to 50, or even 40 years old.) Some also argued that the class of problems are different; physical construction has well-defined and understood limitations, with changes after construction has started being relatively expensive to do. Software, on the other hand, is rapidly iterated on in a very different manner, and the flexibility creates far more potential for ill-defined specifications and mistakes in implementation.

Again, true. It's certainly not in dispute by anyone that software engineering is a newer field than engineering for other disciplines; even those who would argue that "modern" engineering is relatively recent in terms of the span of history certainly wouldn't argue that it's as new. Still, this seems to me to be a relatively uninteresting argument; just because something is new, does that mean we should not strive for better?

A good friend of mine, Michael Stone, provided what I believe to be a more compelling variant of this argument. With apologies in advance to him for any misstatements of his eloquent point: Software failures, unlike failures in other disciplines firmly defined as "engineering", aren't generalizable. Unlike in engineering, where failures can often be traced back to a more general category of problem, software system failures don't share the same kind of fundamental design failure. At most, the community is able to point to certain kinds of failures which are common — off-by-one errors, buffer overflows, etc. — but we don't respond in the same way.

Take the case of the Tacoma Narrows bridge collapse as an example. After the Tacoma Bridge collapsed because of aeroelastic flutter, the state of the art regarding bridge design was fundamentally changed. One of the authors of the report from the Federal Works Agency Commission that investigated the collapse is telling enough that I think it bears reproducing here wholesale:

The Tacoma Narrows bridge failure has given us invaluable information...It has shown [that] every new structure [that] projects into new fields of magnitude involves new problems for the solution of which neither theory nor practical experience furnish an adequate guide. It is then that we must rely largely on judgment and if, as a result, errors, or failures occur, we must accept them as a price for human progress.

Othmar Ammann, The Failure of the Tacoma Narrows Bridge

I can find no examples of bridge collapse attributed to aerodynamic failure after Tacoma Narrows. Action was taken to reinforce similar bridges shortly after the collapse, and aerodynamic flutter is considered in structural engineering of everything from buildings to airplanes today.

As famous as the Taacoma Narrows collapse is in structural engineering, we can find a counterpart in software failure: the case of the Therac-25. The causes of Therac-25 are interesting enough reading that I suggest reading Professor Nancy Leveson's report in full, but the fundamental failure was a race condition (with many exacerbating factors). Therac-25 killed three people and injured three more, while the Tacoma Narrows failure killed no people, and only a single dog.

Unlike in Tacoma Narrows, however, Therac-25 did nothing to fundamentally change the state of the art in software programming. Race conditions, certainly, are a known problem -- in some cases, one that programming languages, techniques, and frameworks strive to solve. Even still, race conditions failures in software programs are still so common that I need not reach past personal knowledge to find examples. Paying the "price for human progress" doesn't seem to substantially advance the state of the art in computer programming. The same class of errors that caused Therac-25 to become lethal in 1987 led to the Northeast Blackout of 2003, affecting over 55 million people and leading to several deaths. There are countless more examples of financial losses and physical harm that can be attributed to race conditions, yet we still write programs vulnerable to these sets of programs today.

Blame is the Enemy of Safety

Whether we have codes of conduct or not, whether software programming is an engineering discipline or something else entirely, it seems to me that there are few who would seriously argue that the current state of the programming world is perfect and that we have nothing to improve. As Professor Leveson would, no doubt, remind us: blame is the enemy of safety. If we, as an industry and a profession, want to build safer, more reliable systems, we must give up the idea that we can prevent accidents if we only sacrifice enough people who write bad software on the altar of "reasons for outage" and "post-mortem analysis". I cannot recommend reading "Engineering a Safer World" by Nancy Leveson enough.

If we are not engineers, then perhaps we are craftspeople; if so, I hope we can become proud of our craft and treat it with the respect and care that it deserves.

Blog Logo

Harlan Lieberman-Berg


Published

The postings on this site are my own and do not express the positions, strategies or opinions of Akamai.
The source for this blog can be found at gitlab.com.
Image

Setec Astronomy

Random rants on politics and discussions on tech.

Back to Overview