What Aircrews Can Teach DevOps Teams
United Flight 232 lost all control when a catastrophic engine failure pierced its hydraulic lines. It should have crashed with all hands lost, but its flight crew worked collaboratively to bring the plane down with only some loss of life. In contrast, Asiana Flight 214 should not have crashed at all, but it did—in large part because its flight crew couldn’t communicate effectively.
Aircrews learn a set of skills called crew resource management, or CRM. CRM involves a structured way of communicating that breaks down barriers among crew members and forces an honest evaluation of the issues raised. It encourages respect for different skill sets, while also deferring a final decision to the captain. In return, the captain listens to and respects all points of view. The intent is to foster an environment where information and solutions are shared freely.
Aircrews practice CRM, but they also practice their craft, over and over again, in flight and on a simulator. The complexities of operating a commercial airliner demand that each action and reaction is second nature, almost by instinct. They also practice failures, so that when an emergency occurs, there is no panic. It’s just another day at the office.
Aircrews also live and die by checklist—sometimes literally. A recent airliner incident occurred because the aircrew forgot to engage the aircraft pressurization system, something that should have been almost automatic. Even if the crew has performed an action thousands of times, following one of their many checklists ensures that it isn’t forgotten, no matter what.
Virtually every part of commercial aviation is automated today. Crews can spend hours on a flight but only minutes in actual hands-on control. Automation brings consistency to operation while relieving the aircrew of the stress of minute-to-minute control. However, if an emergency occurs, automation can confuse the aircrew and get in the way of dealing with the problem. Air France Flight 447, lost over the Atlantic Ocean in 2009, went down because the crew responded inappropriately to the loss of automation in a violent storm.
Software development and delivery are also complex functions, demanding a broadly skilled team working closely together toward a common goal. Automation is a key prerequisite of DevOps so that continuous integration, testing, and delivery can occur.
Because of these similarities, DevOps teams can learn from aircrew practices. The communications practices from CRM would work well on these teams. DevOps teams can prepare and use checklists so that steps in the workflow aren’t forgotten. Failures can be mitigated by devising failure scenarios and practicing them. And DevOps is automated, but when automation fails, teams must be able to pick up the slack and enable the application to continue operating.
Between 2009 and 2018, US air carriers did not have a single fatal accident. We look forward to the day when DevOps teams can perform at this level of reliability.
Peter Varhol and Gerie Owen are presenting the session Bring Your Team Home Safely: What DevOps Teams Can Learn from Aircrews at the Agile + DevOps East 2018 conference, November 4–9 in Orlando, Florida.