My notes and other stuff

2023/01/04

Leveson on Severity

I was reading Engineering a Safer World by Nancy Leveson and it has the first description of accident severity I like, but it's not exactly the same stuff I see in my usual software context.

Specifically, she mentions severities and levels not as a thing related to "the impact the incident is having right now" nor about "what's the level of response we want" (which are the two uses I was familiar with), but as a set of priorities to help designers negotiate goal tradeoffs when making decisions.

The first step in any safety effort involves agreeing on the types of accidents or losses to be considered. In general, the definition of an accident comes from the customer and occasionally from the government for systems that are regulated by government agencies. Other sources might be user groups, insurance companies, professional societies, industry standards, and other stakeholders. If the company or group developing the system is free to build whatever they want, then considerations of liability and the cost of accidents will come into play. Definitions of basic terms differ greatly among industries and engineering disciplines. [...] An accident is defined as:

Accident: An undesired or unplanned event that results in a loss, including loss of human life or human injury, property damage, environmental pollution, mission loss, etc.

An accident need not involve loss of life, but it does result in some loss that is unacceptable to the stakeholders. System Safety has always considered non-human losses, but for some reason, many other approaches to safety engineering have limited the definition of a loss to human death or injury. As an example of an inclusive definition, a spacecraft accident might include loss of the astronauts (if the spacecraft is manned), death or injury to support personnel or the public, non-accomplishment of the mission, major equipment damage (such as damage to launch facilities), environmental pollution of planets, and so on. An accident definition used in the design of an explorer spacecraft to characterize the icy moon of a planet in the Earth's solar system, for example, was:

Prioritizing or assigning a level of severity to the identified losses may be useful when tradeoffs among goals are required in the design process. As an example, consider an industrial robot to service the thermal tiles on the Space Shuttle[...]. The goals for the robot are (1) to inspect the thermal tiles for damage caused during launch, reentry, and transport of a Space Shuttle and (2) to apply waterproofing chemicals to the thermal tiles.

The customer may also have a safety policy that must be followed by the contractor or those designing the thermal tile servicing robot. As an example, the following is similar to a typical NASA safety policy:

General Safety Policy: All hazards related to human injury or damage to the orbiter must be eliminated or mitigated by the system design. A reasonable effort must be made to eliminate or mitigate hazards resulting at most in damage to the robot or objects in the work area. For any hazards that cannot be eliminated, the hazard analysis as well as the design features and development procedures, including any tradeoff studies, must be documented and presented to the customer for acceptance.

Within that context, severities provide a good guideline about what is acceptable or not, what you can sacrifice to maintain higher level guarantees, and in turn help prioritization in incident situations, but that's a side-effect of having clearly defined core priorities rather than picking how loud the alarm is when it's a higher/lower SEV.

It's something I instantly wanted to bring to work and that we had a few discussions about. Clear goal priorities mean goal conflicts become a bit easier to negotiate in difficult situations, and can ensure more graceful degradation when developers align with them as well.