Difference between revisions of "Digital Woes"

From Higher Intellect Documents
Jump to navigation Jump to search
(Created page with "<pre> This document is from the WELL Gopher Server, gopher.well.sf.ca.us For information about the WELL gopher, send e-mail to [email protected] Feb 1994 -------...")
 
(No difference)

Latest revision as of 14:40, 29 July 2020

This document is from the WELL Gopher Server, gopher.well.sf.ca.us
For information about the WELL gopher, send e-mail to  [email protected]
			       Feb 1994
---------------------------------------------------------------------------


This document is the text and Notes for
Chapter One of _Digital Woes: Why We Should Not Depend on
Software_, by Lauren Ruth Wiener, Addison-Wesley 1993, ISBN # 0-201-62609-8.

From DIGITAL WOES: Why We Should Not Depend on Software

Copyright 1993 BY LAUREN RUTH WIENER <[email protected]> 
Book published by Addison-Wesley 1993, ISBN # 0-201-62609-8

		Chapter 1: Attack of the Killer Software (abbreviated)

I'm sitting in an airplane, looking out the window at the tops of fluffy white
clouds.  Once upon a time, this sight was not vouchsafed to humans.  It comes
to me courtesy of the commercial airline industry, one of the twentieth 
century's impressive achievements.  Computers and software have contributed 
a lot toward this experience:

In the cockpit, the pilot is using more software right now than I use in a year.
Software helps determine our position, speed, route, and altitude; keeps the 
plane in balance as fuel is consumed; interprets sensor readings and displays
their values for the pilot; manages certain aspects of the pilot's 
communications; translates some of the pilot's gestures on the controls into 
movements of the wing and tail surfaces; raises or lowers the landing gear.

The pilot is following a route, a path through the three-dimensional air space
that blankets North America eight miles thick.  A lot of other airplanes are 
buzzing around up here with us, and a collision would be calamitous; this 
airplane alone has four hundred passengers.  The air traffic controllers depend 
on software to assign our path through airspace.  The transponder on our 
airplane broadcasts its identification, and near an airport its signal is trans-
lated into a little tag on a radar screen that includes our ID and altitude.
Altitude appears as a number because the screen is two-dimensional and cannot
reflect altitude directly.  The software has to move the tag around on the
screen in ways that reflect the airplane's movements through the air in two
dimensions, but the air traffic controllers have to reconstruct three-
dimensional reality in their heads, quickly and coolly, using the altitude
numbers.  The system involves a lot of pieces--transponders, radar, radar
screens, air traffic controllers, a computer--separated widely in space.
Some of them are moving all the time.  The action on the radar screen must keep
up with the action in the air.  This is a complex problem.  Knowing this,
I am grateful for our safe progress through the sky today, as I enjoy the sunny
cloud tops.

The airplane is holding up pretty well, too.  Computers and software were used
extensively to design it, and to design the process by which it was manu-
factured.  Figuring out how to make something like a jet is an underappreciated
problem.  You have to design a machine that can fly, and also one that can be 
manufactured and maintained.  Sky and runway, the world is a rough place, 
and hundreds of thousands of parts may need replacing.  You also have to 
design the process that produces those parts, and that will get them where they 
are needed.  An enterprise such as Boeing's is an enormous consumer of 
software, computers, and programmers.

Then we have amenities.  Meals, including the kosher one for seat 3B and the 
vegetarian ones for 12A and 22E and F, just like it said in the database.  For
each of us, our own personal copy of Wings & Things, the in-flight magazine.
Bland and predictable it may be, but it took quite a system to get it there,
and software again played an extensive role.  Articles had to be commissioned
and written; the faxes flew.  It was laid out nicely on a Macintosh screen,
using an expensive page-layout software product.  The underpaid talent who
performed this task is now listening to a compact disc through wireless
headphones while making the daily backup diskette.

None of us passengers would even be here, of course, without the airline 
reservation system.  The network of computers linking travel agents with 
airlines is a Byzantine example of economic cooperation and competition in 
uneasy truce.  The economic tension between travel agents seeking good deals 
and airlines seeking to maximize profits has led to amazing software wars.  
Anyway, it got us here, and I paid $379, and the guy over there pecking at his
PowerBook paid $723.  The two women in front of me are taking a trip that 
includes this Saturday night, so they paid $119 each.  Software wars make for
Byzantine price-setting mechanisms, it seems.

To make our reservations, we all used the phone.  You lift the receiver off the 
hook and get a dial tone.  Press eleven buttons, and one phone rings out of 
151 million.  Just the person you want to speak to is on the other end (or her
answering machine or voice mail, but lets not get into that).  An amazingly 
complex, richly connected net of switches opens and closes just for you, and it
is quick about it--another impressive achievement of the century, whose most 
recent frill appears on the seatback in front of me, the AirFone.  Using the 
infrastructure of the cellular phone system, you can now send and receive 
phone calls on an airplane.

No computer invented the cute spelling of AirFone, but the embedded capital 
letter is brought to you by computer programmers, anyway.  Sometimes in a 
program, a programmer wants to name something--a variable, say--to suggest 
what it's being used for.  For example, a commodities tracking program might 
have a variable called "the price of eggs in China."  Computers want these 
things to be typed without spaces in them; compilers are fussy that way and 
must be catered to.  But programmers would like to perceive the individual 
words, not an undifferentiated smear of letters.  This problem gets solved 
several ways, and it is a commentary on the essential humanity of 
programmers that it's a matter of taste and, occasionally, pseudoreligious 
dispute.  Folks programming in C tend to use underscores, thus: 
the_price_of_eggs_in_China.  It gets awfully long to type.  To shorten it, 
some programmers will make mysterious secret names using rules they invent, 
such as the first letter of each word: tpoeic.  This gets them enthusiastically 
loathed by anyone who has to come along later and figure out how their 
programs work.  Another approach embeds capital letters: 
thePriceOfEggsInChina.  Marketing types think this looks stylish and name the
products that way.  Software is infiltrating on all fronts.

We're descending into Portland now--the view out the window goes woolly 
gray with the famous Portland clouds, and then I see cars whizzing below us on
Marine Drive.  These days, they whiz along aided by a wide variety of software: 
computerized braking, fuel injection, suspension, cruise control, transmission, 
four-wheel steering, and maybe even navigation systems use software.  It 
astonishes me that software has so thoroughly colonized the car with
practically no public discussion.  Software makes it cheaper to manufacture
items--cars, for example--because it eliminates many specific little pieces
that must be machined to precise tolerances.  But what a decision to leave to
the manufacturers!

Below us now I see the runway.  On our behalf, a guardian angel peers into the
radar screen, while another in the tower communicates with our pilot over the
radio.  The complex, software-intensive system has worked again--thanks, 
everyone.  We are down.  I am home.  It is raining.

	THIRTEEN TALES OF DIGITAL WOE
This book is about how things can go wrong.  In the next few pages, I'll tell
you thirteen stories of things going wrong.  Sometimes the outcome is comic; 
sometimes it's tragic; sometimes it's both.  It isn't that the people who
design, build, and program computers are any less careful, competent, or
conscientious than the rest of us.  But theyre only human, and digital
technology is unforgiving of human limitations.  So many details must be
tracked!  Even the tiniest error can have an enormous effect.  Of course the
stuff is tested, but the sad truth is that a properly thorough job of testing
a software program could take decades . . . centuries . . . sometimes even
millennia.

Frankly, developing software is not the easiest way to make money.  A careful 
job is expensive, and even the most careful process can leave that tiny, 
disastrous error.  So even after the product is finished and for sale,
developers issue constant upgrades to correct some of the mistakes they've been
hearing about.  Unfortunately, sometimes the upgrade is late.

Failures happen all the time.  The consequences of failure depend on what we 
were using the flaky machine for in the first place.  We put computers into all 
kinds of systems nowadays.  Here are some of the things we choose to risk:
*  reputations,
*  large sums of money,
*  democracy,
*  human lives,
*  the ecosystem that sustains us all.

And yet, some of these systems don't really benefit us much.  Some of them are
solutions to the wrong problem.

The truth is that digital technology is brittle.  It tends to break under any
but stable, predictable conditions--and those are just what we cannot provide.
Life is frequently--emphatically--unpredictable.  You can't think of everything. 

1.  Tiny Errors Can Have Large Effects

On July 22, 1962, a program with a tiny omission in an equation cost U.S. 
taxpayers $18.5 million when an Atlas-Agena rocket was destroyed in error.(1)

The rocket carried Mariner I, built to explore Venus.  The equation was used by
the computerized guidance system.  It was missing a bar: a horizontal stroke
over a symbol that signified the use of a set of averaged values, instead of
raw data.  The missing bar led the computer to decide that the rocket was
behaving erratically, although it was not.  When the computer tried to correct a
situation that required no correction, it caused actual erratic behavior and
the rocket was blown up to save the community of Cocoa Beach.  (This unhappy
duty falls on the shoulders of an unsung hero called the range safety officer,
and we are all glad he's there.)

Mariner I, all systems functioning perfectly, surprised the denizens of the 
Atlantic Ocean instead of the Venusian.


2.  Thorough Testing Takes Too Long

Because such tiny errors can have such large effects, even the best efforts can 
miss something that will cause a problem.  In late June and early July of 1991,
a series of outages affected telephone users in Los Angeles, San Francisco, 
Washington, D.C., Virginia, W. Virginia, Baltimore, and Greensboro, N.C.  The 
problems were caused by a telephone switching program written by a company 
called DSC Communications.  Their call-routing software had several million 
lines of programming code.  (Printed at 60 lines to a page and 500 pages to a 
volume, one million lines equals about 33 volumes of bedtime reading.)  They 
tested the program for 13 weeks, and it worked.

Then they made a tiny change--only three lines of the millions were altered in 
any way.  They didn't feel they needed to go through the entire 13 weeks of 
testing again.  They knew what that change did, and they were confident that it 
did nothing else.(2)  And presumably, the customer wanted it now.  So they
sent off their several-million-line program that differed from the tested
version by only three lines, and it crashed.  Repeatedly.  Sic transit software.


3.  Developing Software Is Not the Easiest Way to Make Money

Sometimes it's the software, and sometimes it's the process itself that is
buggy -- developing software can be a nightmare even for someone who has
succeeded at it before.

Mitch Kapor is a gentleman whose name never appears in the software press 
without the words "industry veteran" in front of it.  It's a fine title, and he 
wears it well: Mitch Kapor is the fellow who wrote the first spreadsheet for the
IBM PC.  He founded Lotus Development Corporation and made a fortune on 
Lotus 1-2-3.  Then he left Lotus "because the company had gotten too big."(3) 
In early 1988, he started another company called On Technology and began work 
on an ambitious project to make personal computers easier to use.  He got some
venture capital, hired a bunch of bright young programmers and a former Lotus
associate to oversee their work, and spent $300,000 a month for thirteen 
months.  When it became obvious that they were years away from a product, 
Mr.Kapor scaled back his ambitions considerably, starting development instead 
on a nice little product called On Location.

On Location is not a major paradigm shift in personal computing, but it is 
handy.  It would have been hard to write this book without it; it provides me
with meaningful access to over twelve megabytes of information.  To show you 
what I mean, Figure 1-1 shows a little bit of my computers desktop.(4)  Each 
one of those little pictures represents a file; the text underneath is the
name of the file.  They all look the same, don't they?  Yet each contains all
kinds of different information.  If I am searching for a snippet about Mitch
Kapor, for example, how do I know where to look?

The answer is that I use this product.  It allows me to search through the whole
computer for any word or words, and it will tell me which files contain those
words.  Some of the files on the list still turn out to be irrelevant, but the
haystack in which I search for my needle of information is now much, much 
smaller.

This is handy, but it isn't earthshaking; it's only a modest application of 
computer technology.  And if anyone was in a position to appreciate how long 
development of this product was going to take, it ought to have been Mr. Kapor. 
He started development in April 1989 and expected to ship the product in 
November.  The target date for shipping was revised three times; the third time 
involved a full-blown management crisis, with the head of engineering 
storming out the door.  The product finally shipped at the end of the following 
February, but only by throwing four more bodies at it.  Several people quit. 
Twice they changed product direction, and twice they abandoned a feature they
had planned.  A seven-month schedule stretched to eleven months and 
involved an extra visit to the venture capitalists.

Software development projects are notorious for cost overruns, missed 
schedules, and products that do less than originally specified.  A lot of
corporate ships have foundered on theocks.  Their captains can feel a bit better
now, though, because they're in excellent company.


4.  Even a Careful Process Can Leave a Problem

Bugs are troublesome, but so is removing them.  The process can leave detritus
that will cause serious problems when the system is in use.  On July 20, 1969,
in the critical final seconds of landing the first manned spacecraft on the
moon, Neil Armstrong was distracted by just such leftover detritus--two
"computer alarms."(5)

Apollo 11 was a software-intensive undertaking for its time, and it suffered its
share of development problems.  To debug the software running on the 
onboard computer, programmers inserted extra bits of computer code which 
they called "alarms" (nowadays theyd be "debugging aids") to help them 
determine what happened inside the computer when their programs 
misbehaved.

As preparations for launch approached maximum intensity, a programmer 
happened to mention the computer alarms to the fellow who programmed the 
simulations that trained the mission controllers.  The alarms had never been 
intended to come to the attention of anyone other than the programmers, and 
the mission controllers had never heard of them.  "We had gone through years 
of working out how in the world to fly that mission in excruciating detail,
every kind of failure condition, and never, ever, did I even know those alarms 
existed," said Bill Tindall, in charge of Mission Techniques.  Nevertheless,
the Apollo personnel had an understandable passion for thoroughness.  Even 
though the alarms could not reasonably be expected to occur during an actual 
mission, the mission controllers were promptly given simulations that 
included them.  This turned out to be fortunate; sometimes the backup works.

The onboard computer had several functions.  Its primary function was to help
land the lunar module on the moon, but it also helped it meet and dock with 
the command module in lunar orbit after leaving the moon.  Obviously, it was 
not going to perform both functions at once, so original procedures called for
flipping a switch to disable the rendezvous radar during descent.  However, 
about a month before launch, it was decided to leave the switch in a different
setting to allow the rendezvous radar to monitor the location of the command 
module during the descent.  The programmers felt it would be safer for the
crew if the rendezvous software could take over immediately, in case the landing
had to be aborted for any reason.

But one change to a complex, delicately balanced system leads to others.  In
this case, it led to too many others, too close to the launch date.  When the
extent of the changes became apparent, the software engineers decided to return
to the original procedures.  But the appropriate software changes had already
been loaded into the lunar modules computer, and it was a ticklish job to back
them out.  They decided instead to withhold the radar data from the rendezvous 
software, figuring that therefore it wouldnt track the command module during 
descent.  It seemed like the simplest solution.

But computers don't know that no angle has both a sine and cosine of 0.  As the
lunar module approached the surface of the moon, the computer gamely 
attempted the impossible task of tracking the command module with 
mathematically impossible data and landing the lunar module at the same 
time.  Both tasks proved to call for more processing than it could perform 
simultaneously, so it issued an alarm indicating an overflow.  Moments before
the historic landing, a twenty-six-year-old mission controller had to decide 
whether to abort the mission.  It was a tough call--some alarms indicated a 
serious problem, others could safely be ignored, but other factors complicated
the picture.  The mission controller had nineteen loonds to think it over 
before deciding to continue.  Then a new alarm occurred and he had to make 
the decision all over again.

Meanwhile, during crucial moments in the lunar lander, the astronauts were 
distracted from seeing that their chosen landing site was strewn with boulders. 
With twenty-four seconds of fuel, they were maneuvering around rocks.  They 
landed with no margin for error.


5.  Sometimes the Upgrade Is Late

On February 25, 1991, an Iraqi Scud missile killed twenty-eight American 
soldiers and wounded ninety-eight others in a barracks near Dhahran, Saudi 
Arabia.  The missile might have been intercepted had it not been for a bug in
the software running on the Patriot missile defense system.(6)  The bug was in
the software that acquired the target.  Heres how the system is supposed to 
work:

1.  The Patriot scans the sky with its five thousand radar elements until it 
detects a possible target.  Preliminary radar returns indicate how far away the 
target is.

2.  Subsequent radar data indicate how fast the target is going.

3.  Now the Patriot must determine its direction.  The Patriot starts by
scanning the whole sky, but it can scan more accurately and sensitively if it
can concentrate on just a small portion, called the tracking window.  Now it 
needs that improvement.  It calculates where the Scud is likely to be next, 
using calculations that depend (unsurprisingly) on being ultraprecise.  Then 
it draws the tracking window--a rectangle around the key portion of the 
sky--and scans for the Scud.  If it sees the Scud, it has acquired the target,
and can track its progress.  If it does not see the Scud, it concludes that the 
original blip was not a legitimate target after all.  It returns to scanning
the sky.

The equation used to draw the tracking window was generating an error of one 
10-millionth of a second every ten seconds.  Over time, this error accumulated
to the point where the tracking window could no longer be drawn accurately, 
causing real targets to be dismissed as spurious blips.  When the machine was
restarted, the value was reinitialized to zero.  The error started out
insignificant and gradually began to grow again.

Original U. S. Army specifications called for a system that would shut down 
daily, at least, for maintenance or redeployment elsewhere.  The Army 
originally did not plan to run the system continuously for days; it was only 
supposed to run for fourteen hours at a stretch, and after fourteen hours the
error was still insignificant.  But successful systems nearly always find 
unanticipated uses, and by February 25th, the Patriot missile defense
installation near Dhahran had been running continuously for a hundred hours--
five days.  Its timing had drifted by 36-hundredths of a second, a significantly
large error.

The bug was noticed almost as soon as the Gulf War began.  By February 25th, it 
had actually been fixed, but the programmers at Raytheon also wanted to fix 
other bugs deemed more critical.  By the time all the bugs had been fixed --
--and a new version of the software had been copied onto tape,
--and the tape had been sent to Ft. McGuire Air Force Base,
--and then flown to Riyadh,
--and then trucked to Dhahran,
--and then loaded into the Patriot installation--
--well, by that time it was February 26th, and the dead were already dead, and 
the war was just about over.


6.  We Risk Our Reputations

In the summer of 1991, a company called National Data Retrieval of Norcross, 
Georgia, sent a representative to Norwich, Vermont, looking for names of 
people who were delinquent on their property taxes.  National Data Retrieval 
wanted this information to sell to TRW, a large credit-reporting agency.  The
town clerk showed the representative the towns receipt book.  Because of a 
misunderstanding, the representative copied down all the names in it--all the 
taxpayers of Norwich.  Back in Georgia, the names were keypunched in and 
supplied to TRW, which then began to report: "delinquent on his/her taxes" in 
response to every single query regarding a Norwich property owner.(7)

Credit information is not routinely sent to those most keenly affected, such as 
these maligned property owners.  So the information spread from computer to 
computer, trickling into many tiny rivulets of the Great Data Stream.  The town 
clerk began receiving a series of suspiciously similar inquiries, asking for 
confirmation of imaginary tax delinquencies.  It did not take her long to trace 
these queries to TRW.  After a mere week or so of phone calls, and only one
story planted in the local newspaper, someone at TRW undertook to correct their 
records.

Now suppose that TRW promptly and faithfully does so.  The barn door swings 
slowly shut.  In the meantime, how many computers have queried the credit 
status of how many Norwich residents?  Applications for loans, credit card 
transactions, even actions taken months previously can spark such queries.  
Due to one error, other computers have already received the false reports of tax
delinquencies.  TRW may correct its own records, but no Proclamation of 
Invalidity will be sent to those other computers.  Probably, no one even knows
where the data went.  Though officially dead, the zombie information stalks the 
data subjects, besmirching their data shadows and planting time bombs in their
lives, maybe forever.


7.  We Risk Financial Disaster

On Wednesday, November 20, 1985, a bug cost the Bank of New York $5 million 
when the software used to track government securities transactions from the 
Federal Reserve suddenly began to write new information on top of old.(8)  The
event occurred inside the memory of a computer; the effect was as if the
(digital) clerk failed to skip to a new line before writing down each
transaction in an endless ledger.  New transaction information was lost in the
digital equivalent of one big, inky blotch.  The Fed debited the bank for each
transaction, but the Bank of New York could not tell who owed it how much for
which securities.  After ninety minutes they managed to shut off the spigot of
incoming transactions, by which time the Bank of New York owed the Federal
Reserve $32 billion it could not collect from others.

A valiant effort by all concerned got them up to a debt of only $23.6 billion by
the end of the business day, whereupon a lot of people probably phoned home 
to say: "Honey, I wont be home for dinner tonight... Well, uh, probably really
late..."  Pledging all its assets as collateral, the Bank of New York borrowed
$23.6 billion from the Fed overnight and paid $5 million in interest for the 
privilege.  By Friday, the database was restored, but the bank also paid with
intangibles: for a while, it lost the confidence of investors.

Another consequence, however slight, is that an unknown number of 
econometric models received incorrect data for a couple of days, thereby
possibly skewing whatever decisions were based on them.


8.  We Risk Democracy

In the spring of 1992, the Liberal Party of Nova Scotia, Canada, held a 
convention.  They used a computerized telephone voting system to allow any 
convention delegate with a touch-tone phone to vote from home by dialing the 
telephone number for the candidate of his or her choice.(9)  (Those without 
touch-tone phones could go to any of several locations where banks of phones 
were set up.)  All registered Liberals received a PIN which, when entered, 
verified that they were entitled to vote.  A thank-you message verified that 
their votes had been recorded.  Maritime Tel & Tel, the local telephone 
company, persuaded the convention organizers that a backup voting system 
using paper ballots was unnecessary.  After all, they handled hundreds of 
thousands of calls a day.  What could go wrong?

Everything.  The software turned out to be too slow to handle the volume of 
calls, so many votes were not recorded.  In the ensuing confusion, voting was
suspended and resumed, then the deadline was extended--twice.  Some people 
reported that their PINs were rejected.  Others were able to vote more than
once.

Adding the final touch to this election-day chaos, a kid with a scanner called
up the Canadian Broadcasting Corporation and announced that he had recorded a 
cellular telephone conversation between the telephone company and the party, 
giving the results so far.  Representatives of the CBC, uncertain whether this
was a hoax, discussed whether to air his story with an executive producer--also
over a cellular telephone.  When the kid called back with a recording of _that_
conversation, the CBC decided to run the story.  Needless to say, this did not
improve matters.

A week or so later, the dust settled and the Liberal Party decided to try again.
This time they required the telephone company to post a $350,000 performance 
bond.  They also made available a backup system that allowed people to vote 
with paper ballots.  The backup system turned out to be unnecessary--the 
second time around, voting by phone worked fine.


9.   We Risk Death

In the spring and summer of 1986, two cancer patients in Galveston, Texas died 
from radiation therapy received from the Therac-25, a computer-controlled 
radiation therapy machine manufactured by Atomic Energy of Canada, Ltd.(10)  
AECL was hardly a fly-by-night outfit--it was a crown corporation of the
dominion of Canada, charged with managing nuclear energy for the nation.

A machine such as the Therac-25 can deliver two different kinds of radiation:
electrons or X-rays.  To deliver electrons, the target area on the patient's
body is irradiated directly with a beam of electrons of relatively low
intensity.  This works well for cancerous areas on or near the surface of the
body, such as skin cancer.  For cancers of internal organs, buried under healthy
flesh, a shield of tungsten is placed between the patient and the beam.  An
electron beam one hundred times more intense bombards the tungsten, which
absorbs the electrons and emits X-rays from its other side.  These X-rays pass
part of the way through the patient to strike the internal cancers.

What you want to avoid is the hundred-times-too-strong electron beam 
accidentally striking the patient directly, without the intervening tungsten 
shield.  This unsafe condition must be forestalled.  But it was not--under these
circumstances:

The operator selected X-rays as the desired procedure, the tungsten shield 
interposed itself, and the software prepared to send the high-intensity electron
beam.  Then the operator realized that she had made a mistake: electrons, not
X-rays, were the required therapy.  She changed the selection to electrons and
pressed the button to start treatment.  The shield moved out of the way, but the
software had not yet changed from the high- to the low-intensity beam setting
before it received the signal to start.  Events happened in the wrong order.

Previous radiation therapy machines included mechanical interlocks--when 
the tungsten target moved out of the way of the beam, it physically moved 
some component of a circuit, opening a switch and preventing the high-
intensity beam from turning on.  On the Therac-25, the target sensor went from
the tungsten directly to the computer.  Both the target position and the beam
intensity were directly and only under software control.  And the software had a
bug.

As if in a bad science fiction movie, the Therac-25 printed "Malfunction 54" on 
the operator's terminal as it gave each man a painful and lethal radiation 
overdose.  One died within a month of the mishap; the other became paralyzed,
then comatose, before he finally died several months later.

These were not isolated events.  Of eleven Therac-25 machines in use, four 
displayed these or similar problems.  Over a two- to three-year period, three
other serious incidents involving the software occurred in clinics in Marietta, 
Georgia; Yakima, Washington; and Ontario, Canada.


10.  We Risk the Earth's Ability to Sustain Life

It's not surprising that erroneous data can cause problems, but correct data 
doesn't guarantee that there will be _no_ problems.  As anyone can tell you who 
has spent time in public policy think tanks, computer models can be created to
provide any answer you want.  One way to do it is to specify ahead of time the
answers you do not want.

In the 1970s and 1980s, NASA satellites orbiting the earth observed low ozone
readings.  The readings were so low that the software for processing satellite
results rejected them as errors.(11)  Checking to determine whether a value is
in an expected range is a common form of sanity check to include in a program.  
Such a check would be useful in a grading program, for example: if student 
grades are expected to be within the range of 0.0 to 4.0, inclusive, then
checking for that range can help you find places where someone made a mistake
entering a grade.

It's easy to incorporate a sanity check for a grade, because those limits are
set by people, using terms that are straightforward and unambiguous to a
computer--real numbers.  People sure don't set the ozone levels--well, not
directly.  As it turns out, we do, in a way, but that was precisely the question
NASA was investigating, and they weren't prepared to believe the answer.

In 1986, a team of earthbound British scientists reported the decline in ozone
levels.  NASA reexamined its old data and confirmed their findings.  The world
may have missed a chance to get a jump on the ozone problem by a decade or 
more--without an independent source for data, it is risky to reject a reading 
because it doesn't meet your preconceptions.  (On the other hand, maybe we just
missed an additional decade's worth of argument.)


11.  We May Not Gain Much

In February 1990, an article appeared describing a seeming reversal of 
progress:  the Washington State ferry system announced that it planned to 
replace the electronic control systems of the large, Issaquah-class ferries with
pneumatic controls.(12)  Ferries with electronic controls had rammed the dock,
left before being told to do so, or unexpectedly shifted from forward to
reverse.  The folks in charge had had enough.

Washington State Ferries is the largest ferry transportation system in the
United States; thousands of people in western Washington live on the Olympic 
Peninsula or the beautiful islands across Puget Sound from Seattle, and take the
ferries daily to and from work.  Under the circumstances, Washington State 
responsibly decided it did not need to run a poorly controlled experiment with
the latest technology.  Older pneumatic control systems, which require a 
physical connection from the control cabinet to the propellors and engine 
governors, had been doing the job before, and they'd been more reliable.


12.  We May Be Solving the Wrong Problem

In the early 1980s, General Motors embarked upon an enormous investment in 
automation.  In 1985, it opened its showcase: the new Hamtramck factory in 
Detroit, Michigan, had 50 automatic guided vehicles (AGVs) to ferry parts 
around the plant, and 260 robots to weld and paint.(13)  It turned out not to be
such a hot idea.

"...Almost a year after it was opened, all this high technology was still so
unreliable that the plant was producing only about half the 60 cars per hour
that it was supposed to make...

". . . The production lines ground to a halt for hours while technicians tried
to debug software.  When they did work, the robots often began dismembering each
other, smashing cars, spraying paint everywhere, or even fitting the wrong
equipment....  AGVs ... sometimes simply refused to move."

In his headlong rush to beat Japan to the twenty-first century, GM chairman 
Roger Smith failed to notice that GM's biggest problems lay not with its 
production processes, but with the way it treated its employees.  A thoughtful
look at his Japanese competitor revealed that the training, management, and 
motivation of workers was the source of their successes, not high technology.
Not only was the technology expensive and unreliable, it was a solution to the
wrong problem.


13.  Life Is Unpredictable

Despite President Reagan's pledge to get government off their backs, the folks
living near March Air Force Base in California had to tolerate his interference 
with their garage door openers from January 1981 to December 1988.(14)  Air 
Force One had some powerful and inadequately shielded electronics.  The 
consequences to nearby communities were never adequately explored, so when 
Ronald Reagan rode into town, his neighbors always knew.

	UNRELIABLE SYSTEMS, UNMET DESIRES 

As a society, we have our strengths.  Most houses have electricity and indoor
plumbing, most roads are in pretty good shape, and the bridges ordinarily don't
fall.  (Well, there was that one on the Connecticut Turnpike.)  People 
sometimes suggest that the problems suffered by digital systems are so extensive
because we have been building them for so short a time.  We've been building 
physical systems such as roads and bridges for centuries, the argument runs.
When we first started, doubtless people experienced this same level of failure
and frustration.

This theory is hard to test, if only because it's impossible to remember a time
when we didn't know _something_ about, for example, building bridges.  But we 
do have a good historical and photographic record of the process of learning to 
build bridges from concrete, chiefly as documented in the life, work, and 
passion of a Swiss engineer named Robert Maillart (1872-1940).(15)  In fact, the
first bridges to be built of this new material developed cracks after a few
years, but they didn't come crashing down.  By the end of his life, Maillart had
learned how to engineer such bridges reliably.  Many of them are still in use in
Switzerland today.  His later bridges are not only sturdy and reliable, they are
graceful and beautiful as well.

The software industry hasn't fared so well.  Before the end of the 1990s, we
will be able to celebrate the golden anniversary of developing commercial
software.  In 1968, at a NATO Conference on Software Engineering, software
professionals coined the term "the Software Crisis" to describe the difficulties
they were having in building reliable systems.(16)  Since then, a lot of books
and papers have been written about it, and a lot of seminars with titles such as
"Managing Complexity" have been held.  The crisis itself will soon be
celebrating its silver anniversary.

This isn't the way we thought it was going to be.  We once had a far more 
optimistic view of our ability to build and maintain complex systems.  This 
view is nicely illustrated by two science fiction stories written about sixty
years ago.  They purport to describe two opposite events--the computer that
crashes and the one that doesn't.  Yet they both describe systems far more
ambitious than any we could now hope to build, maintained for far longer than
we could ever dream of.  And as we shall see, they have other things in common
as well.


The Machines Will Take Care of Us

The first story, "Twilight," was written by the patron saint of modern science 
fiction, John W. Campbell, Jr.  It was first published in 1936.

In "Twilight", a time traveler visits a city somewhere on earth in the far
future.  He finds no one--the population of the earth is now considerably
diminished, and whole cities have been abandoned.  Nevertheless, in an
inspirational display of system robustness and reliability, the machines are up
and running:

"I don't know how long that city had been deserted.  Some of the men from the
other cities said it was a hundred and fifty thousand years.... The taxi machine
was in perfect condition, functioned at once.  It was clean, and the city was
clean and orderly.  I saw a restaurant and I was hungry....(17)"

The protagonist eats the millennia-old food, which is still wholesome, and 
cruises around the city in the taxi.  In true Campbell fashion, he stops next at
a subterranean level to watch the machinery:

"The entire lower block of the city was given over to the machines.  Thousands.
But most of them seemed idle, or, at most, running under light load.  I
recognized a telephone apparatus, and not a single signal came through.  There
was no life in the city.  Yet when I pressed a little stud beside the screen on
one side of the room, the machine began working instantly."

(By the way, that phone system has an excellent user interface.  Campbell's
hero knows exactly which button starts the system, despite missing millions of
years of cultural history.  Likewise, will those engineers who worked on the
taxi service please call their offices?  Raytheon, Boeing, the Bank of New 
York...they'd all like to talk to you.)

But Campbell's man doesnt stay forever down among the machines:

"Finally I went up to the top of the city, the upper level.  It was a paradise.
There were shrubs and trees and parks, glowing in the soft light that they had
learned to make in the very air.  They had learned it five million years or more
before.  Two million years ago they forgot.  But the machines didn't, and they
were still making it."(18)

It should be evident by now that a system that can operate for five million
years, maintaining itself without human help for two million years, is simply 
miraculous.  But its a fruitless miracle.  The machines still function, but
the people can hardly manage to.  They are declining, energies sapped, vision
spent, victims of their success.

" The men knew how to die, and be dead, but the machines didn't," Campbell 
wrote sadly.


The Machines Won't Take Care of Us

In 1928, E. M. Forster, a writer of deeper insight, wrote a wonderful tale of
system breakdown called, "The Machine Stops."  The story depicts a society in 
which each person lives in a small underground room.  All needs and wants 
are furnished by the Machine, so there is no need ever to leave one's room.  
The Machine provides ventilation, plumbing, food and drink, movable 
furniture, music, and literature.  Secondhand, machine-mediated experiences of
all kinds are universally available through a worldwide multimedia 
communication network.  Automated manufacturing and transportation 
systems provide a stunning array of commodities.  This has gone on for so long
that people remember no other life; they are utterly dependent; even 
breakdowns have been repaired automatically by the mending apparatus.

However, one day the Machine begins to disfigure its musical renditions with 
"curious gasping sighs."  Soon thereafter, the fruit gets moldy; the bath water
starts to stink, the poetry suffers from defective rhymes.  One day beds fail to
appear when sleepy people summon them.  Finally:

"It became difficult to read.  A blight entered the atmosphere and dulled its
luminosity.  At times Vashti could scarcely see across her room.  The air, too,
was foul.  Loud were the complaints, impotent the remedies, heroic the tone of
the lecturer as he cried, `Courage, courage!  What matter so long as the Machine
goes on?  To it the darkness and the light are one.'  And though things improved
again after a time, the old brilliancy was never recaptured, and humanity never 
recovered from its entrance into twilight."(19)

Ultimately, the Machine breaks down completely, and with it, the entire society 
on which it is based.  But on the earth's surface, homeless, half-barbaric
rebels still live.  Humanity is not wholly lost, after all.

Forster's story may seem to take a more modern, skeptical view of technology, 
but both stories assume a degree of reliability and robustness far beyond 
anything we can seriously imagine achieving.  We build systems representing 
only a tiny fraction of this size and complexity, and they break all the time.


Either Way, We Won't Like It

The point is not, however, that Forster or Campbell were lousy futurists.  
Everyone is a lousy futurist; Real Life(TM) is too chaotic, complex, and rich in
detail to predict.  These writers' concern is not accurate prediction, but the
human soul; their stories are about our primal yearning to be cared for.

The temptation to make machines our caretakers is a modern form of a basic 
human desire.  These stories warn of the consequences of succumbing to this 
temptation: we lose touch with our natures.  Our bodies continue living, but 
our souls die.

The scenarios are far-fetched, but the warning isn't.  The urge to let the 
machines take care of us is still with us; software, we feel, can do it. 
Software is flexible, it responds to us, it adapts to the situation.  The
digital systems we are now building really were unimaginably complex just a
decade ago.  Some of them perform functions that have never before been
performed, because they could be accomplished no other way.  With the advent of
digital systems, we seem at last to be on the verge of building machines big and
complicated and smart enough to take care of things.  This is an illusion.

Digital systems are capable of a lot of flexibility.  Many are even capable of
reasonable robustness.  They can add a lot to our lives.  But they are not 100
percent reliable, nor will they become soe foreseeable future.  Of course,
perfection isn't always required.  It's nice to get a weather report, even if
you know you can't count on it.  But nothing less than perfection will do for 
running a nuclear power plant; the consequences of even a small failure could
be just too disastrous.

Of all the software we rely on daily, none of it is bug-free.  It's natural to
want the machines to take care of us.  But it isn't wise.  As well see in the
next chapter, it is not in the nature of software to be bug-free.

>From reed.edu!lauren Sat Oct 23 12:03:56 1993
Received: from nkosi.well.sf.ca.us ([192.132.30.4]) by well.sf.ca.us with SMTP id <14029-2>; Sat, 23 Oct 1993 12:03:45 -0700
Received: from reed.edu (reed.edu [134.10.2.45]) by nkosi.well.sf.ca.us (8.6.1/8.6) with SMTP id MAA23331 for <[email protected]>; Sat, 23 Oct 1993 12:03:39 -0700
Received: from 127.0.0.1 by reed.edu (/\==/\ Smail3.1.25.1 #25.21)
        id <[email protected]>; Sat, 23 Oct 93 12:03 PDT
Message-Id: <[email protected]>
To:	Howard Rheingold <[email protected]>
Subject: Digital Woes Chapter 1 Notes
In-reply-to: Your message of "Thu, 21 Oct 93 21:29:08 PDT."
             <[email protected]> 
Date:	Sat, 23 Oct 1993 12:03:37 -0700
From:	Lauren Wiener <[email protected]>
Status: R

>From DIGITAL WOES: Why We Should Not Depend on Software
Notes to Chapter 1 (abbreviated)

Copyright 1993 BY LAUREN RUTH WIENER

	NOTES

1  Ceruzzi, Paul.  Beyond the Limits: Flight Enters the Computer Age. 
Cambridge, MA:  MIT Press, 1989, pp.20-23.

2  Andrews, Edmund L.  "String of Phone Failures Perplexes Companies and U.S.
Investigators." New York Times, July 3, 1991.  "Theories Narrowed in Phone
Inquiry." New York Times, July 4, 1991, p.10.  Markoff, John.  "Small Company
Scrutinized in U.S. Phone Breakdowns."  New York Times, July 5, 1991, p.C7.
Andrews, E.  "Computer Maker Says Flaw in Software Caused Phone Disruptions."
New York Times, July 10, 1991, p. A10.  Rankin, Robert E. "Telephone Failures 
Alarming." Oregonian, July 11, 1991, p. A13.  Science News "Phone glitches and
software bugs, Aug. 24, 1991, p.127.  Also, comp.risks, 12:2, 5, 6, and more.

3   Carroll, Paul B.  "Painful Birth: Creating New Software Was Agonizing Task
for Mitch Kapor Firm." The Wall Street Journal, May 11, 1990, pp. A1, A5.

4  What I see when I turn on my computer (a Macintosh, as you may have guessed)
and open a few windows.  Many of you have already guessed that my desktop is
showing me comp.risks archives.  Thank you again, Peter G. Neumann, for the
incomparable service this forum provides.

5  Murray, Charles, and Catherine Bly Cox.  Apollo: The Race to the Moon.  New
York: Simon and Schuster, 1989, pp.344-55.  The quote is from p.344.  The story
and additional analysis can be found in Ceruzzi, op. cit. pp.212-218.

6  Hughes, David.  "Tracking Software Error Likely Reason Patriot Battery
Failed to Engage Scud,"  Aviation Week and Space Technology, June 10, 1991,
pp.25-6.

7  Schwartz, John.  "Consumer Enemy No. 1" and "The Whistle-Blower Who Set TRW
Straight." Newsweek,  Oct. 28, 1991, pp.42 and 47.   Miller, Michael W.
"Credit-Report Firms Face Greater Pressure; Ask Norwich, Vt., Why." The Wall
Street Journal, Sept.23, 1991, pp.A1 and A5.  Also reported in comp.risks,
12:14,  Aug. 19, 1991.

8   Berry, John M. " Computer Snarled N.Y. Bank: $32 Billion Overdraft Resulted
>From Snafu,"Washington Post, Dec. 13, 1985, p. D7, as reported in comp.risks,
1:31, Dec. 19, 1985.  Zweig, Phillip L. and Allanna Sullivan.  "A Computer SnafuSnarls the Handling of Treasury Issues." Wall Street Journal, Nov. 25, 1985,
reprinted in Software Engineering Notes, 11:1, Jan. 1986, p.3-4.  Also
Hopcroft, John E. and Dean B. Krafft.  "Toward better computer science," IEEE
Spectrum, Dec. 1987, pp.58-60.

9  comp.risks 13:56 and 58, Jun. 9 and 15, 1992.  Items contributed by Daniel
McKay, to whom thanks is due for his thorough reporting and thoughtful analysis.

10  Jacky,  Jonathan.  "Programmed for Disaster:  Software Errors That Imperil
Lives." The Sciences, Sept/Oct. 1989, pp. 22ff; also "Inside Risks: Risks in
Medical Electronics." Communications of the ACM, 33:12, December, 1990, p.136;
also personal communication, Seattle, WA, Jan. 14, 1991.

An excellent and thorough technical report covering all aspects of the subject
is: Leveson, Nancy G.,  and Clark S.Turner. An Investigation of the Therac-25
Accidents.  Univ. of Washington Technical Report #92-11-05
(also UCI TR #92-108), Nov. 1992.

11  Forester, Tom, and Perry Morrison. Computer Ethics: Cautionary Tales and
Ethical Dilemmas in Computing.  Cambridge, MA: MIT Press, 1990, p.75.  Also,
New York Times Science section, July 29, 1986, p.C1.

12  Fitzgerald, Karen.  "Faults and Failures: Ferry Electronics Out of Control."
IEEE Spectrum, 27:2, Feb. 1990, p. 54.

13  "When GM's robots ran amok." The Economist, Aug.10, 1991, p.64-65.

14  Forester and Morrison, op. cit. p. 73.

15  See Billington, David P.  Robert Maillart's Bridges: The Art of
Engineering.  Princeton, N.J.: Princeton University Press, 1989.

16  Dijkstra, Edsger W.  "Programming Considered as a Human Activity" in
Classics in Software Engineering, Edward Yourdon, ed.  New York: Yourdon Press,
1979, pp. 39.

17  "Twilight" John W. Campbell, copyright 1934 by Street and Smith
Publications, Inc.  First published in February 1935 Astounding Stories, and
reprinted in The Best of John W. Campbell. Garden City, N.Y.: Nelson Doubleday,
Inc. 1976, pp.28-29.

18  ibid.

19  Forster, E. M.  "The Machine Stops."  From The Eternal Moment and Other
Stories, Orlando, Fla.: Harcourt Brave Jovanovich, Inc., 1928.