"On the last integrated simulation, 11 days before the launch of Apollo 11, a program alarm went off during the descent of the lunar module. Steve Bales was the controller in charge of guidance for the LM, and he had no idea what the alarm meant. He called an abort, with the LM 10,000 feet above the lunar surface. “I had a hard time explaining my actions” after the simulation, Bales says. “Something was going on we didn’t understand, so I thought we should abort.” The program alarms were in part debugging aids, useful to programmers as they developed the programs; they were built in to let a programmer know that the computer was overloaded, unable to finish all the tasks in its execution frame. Mission planners never expected them in real time.
After the aborted simulation, flight director Gene Kranz assembled the controllers, Garman remembers, and told them to develop a response for every program alarm. There were about 40 alarms. “Most were innocuous,” Bales says, “but about 10 were in a class requiring judgment.” For these, Garman says, “the notes we wrote were to the effect that if the alarm doesn’t happen too often and nothing else seems wrong, then the best thing is to just proceed.”
As it happens, Bales was the guidance controller on duty for Apollo 11’s landing on the moon. Exactly 316 seconds into the descent, Buzz Aldrin reported a “1202” program alarm, one of those requiring judgment. Forty seconds later the alarm repeated.
“That was a shock to our system,” says Bales. “We had 10 to 15 seconds to decide what to do. I remember Jack [Garman] talking in my ear, saying ‘It’s not coming too fast, it’s the same type we had before.’ ” Bales called “Go” to the flight director. The alarms recurred three more times before the landing. Because of this distraction (and because they had to fly past the landing site, which was strewn with boulders), the astronauts lost track of where they were, and it took mission control a few hours to pinpoint their location.
It took even longer to determine why the alarms occurred, but the source turned out to be extraneous data from the rendezvous radar. The radar had no role to play in the landing but would be used by the LM after takeoff from the moon for return to the command module. Initial mission procedures called for the radar to be shut off during the landing, but at the last minute it was decided to leave the radar on in case the landing was aborted and it was needed. What mission planners didn’t realize was that while the LM computer was busy carrying out the tasks necessary for landing, it was also processing data from the rendezvous radar.
“The computer was interrupting itself hundreds of times a second, adding and subtracting bits from memory,” says Garman. “Just the act of doing that addition and subtraction stole 15 percent of the computer’s available time.” Carrying out the tasks necessary for landing took about 85 percent of the computer’s available time, so the added work sometimes pushed the computer to the end of the cycle before all tasks were completed, triggering the alarms.
“Had the radar noise problem taken 20 percent of the computer’s time, it’s not clear we could have landed,” says Garman.
“Our software saved the mission,” Hamilton says, “because it was asynchronous—it bumped low-priority tasks. Without it, the mission would have aborted or crashed on the moon.”
Read more at https://www.airspacemag.com/space/practicing-safe-software-180962744/#bfhYbi7MJQtCmTwS.99"