My notes and other stuff


Paper: How audits fail according to incident investigations

It had been a while since I wrote fresh notes from a fresh paper, so this week I took a look at How audits fail according to accident investigations: A counterfactual logic analysis, published on January 26 2024 by Ben Hutchinson, Sidney Dekker, and Andrew Rae.

This is a super interesting paper to me, but to know why, you first have to know about something called counterfactual reasoning in the context of incident investigations. We'll get back to counterfactual reasoning because it's real neat, but the quick version of it is that it's what happens when you think people should have done something else, or that if they had done things differently, the bad events would have been avoided. This is generally a no-no for folks in Resilience Engineering (again, we'll cover this soon), but what the authors did here was use the counterfactual reasoning that was written out in incident investigation reports which also mentioned safety audits (whether internal or external), and then they used that as an example of what the investigators think auditing should be doing. By comparing these, they can also figure out how audits fall short of the expectations.

Very cool trick.

But let's dig a bit into the background. The first stop the authors take is around auditing and its role around incidents, which also exposes their views behind incident causation:

Disasters are “essentially organized events” that require a long incubation period of discrepant gaps in organizational, social, and psychological patterns, as well as prolonged neglect or discounting of potential signs of danger. These two elements allow accident precursors to accumulate unnoticed. Theoretically, auditing can help organizations detect accident precursors, but auditing can also provide a false assurance of safety, inadvertently allaying subjective concerns for issues where alleviating concern was not objectively warranted. While ostensibly safety audits can sensitize—or desensitize—organizations to signals of danger, little empirical evidence exists.

Since they're using incident reports, they also take the time to define the two great ends of the spectrum for incident investigators:

As they state:

Accident analysis, from both positivist and constructivist perspectives, involves selection and prioritization. The construction of a preferred account of an accident necessarily silences and underemphasizes other competing possibilities. This process can be informed by an idealized model of what could or should have happened. In other words, investigations construct, explicitly or implicitly, what did or did not happen using a process of counterfactual reasoning against an idealized model.

What this means, in short, is that you can't cover all the things, and what you choose to cover is often informed based on what a reasonable or ideal situation would have been; counterfactual reasoning helps people create various scenarios and evaluate the real world events against them. Whether this is useful or not is still up for debate in the safety science community, which ranges from "dismissive" to "sees value but is cautious", and defines three camps:

  1. Dismissive of counterfactuals: because an investigation reconstructs events by breaking them down in analyzing individual components, it contains cherry-picking, story-crafting, and can't be objective. Counterfactual reasoning can narrow your focus in correcting people rather than changing systems.
  2. Middle camp: counterfactual arguments can be used to convince readers that accidents can be prevented under different circumstances and can be useful, more so as a persuasion device with expert judgment than a conclusive one on a technical level.
  3. Literal approach: this is the default position, where counterfactual reasoning is used to represent an idealized safety model against which people may interpret events.

The authors point out that this third approach is not explicitly explained nor defended in the literature, but is often criticized and attacked by the other camps. The paper doesn't aim to criticize counterfactual reasoning in this case, but to use it to see if it could be used to close the gap between what people idealize and the real world in terms of safety.

This idealized picture is why counterfactuals are important here: they can be used to define it. They did so by looking at the last 30 or so years of incident reports (14,576 reports) and narrowed it down to full safety investigations with mentions of auditing failures, with counterfactuals in them (44 reports), and then did some deep qualitative analysis on them.

They managed to narrow it down in one big table that covered most of it (if you want to know more about the method or the details, I do encourage you to read the article, it's open access):

Table 1, showing 4 categories and 9 sub-elements as coded by the investigation, their view as reconceptualized as counterfactuals, and the example of how they departed from expectations

Their 4 big categories are failing to understand, act, manage, or focusing based on the audit results. Each category has multiple examples cited. They found multiple insights regarding auditing as well.

The first and most numerous one is that audits tend to direct focus on documents and surface issues, rather than on elements that, in hindsight, were important, which results in surface compliance:

an “ideal” audit should delve beyond the presence of documentation and system artifacts to verify practical system functioning. [...] [Audits] can over-prioritize the collection of documents and the presence of artifacts at the expense of probing system functionality. Said differently, a gap exists between how systems are expected to function versus how systems do function. That gap is argued to drive decoupling, further widening expectation versus reality while providing a false veneer of safety.

A second one is that audits reinforce that positive view of safety by failing silently:

an “ideal” audit should not only pinpoint crucial shortcomings and alert organizations to safety risks but also distinctly signal if the audit program is underperforming – “failing loudly”.

If there's no audit, there's generally no noise made; if there's an inadequate audit, it can promise good results it is not able to deliver:

audits inadvertently allayed concerns, a type of probative blindness where subjective confidence in safety was disconnected from objective risk. At a pathological level, audits may be “inadequate to provide early warning of process safety risk”.

A third one is that audits tend to downplay hazards and minimize them:

the “ideal” audit is expected to identify and interpret hazards and issues in a way that is relevant and sensitizing to protecting health and safety and/or accurately estimate the effectiveness of risk controls.

They mention that sometimes audit do something they call "interpretive failure", where despite aiming to uncover issues, they can let them incubate for a longer time by either not finding them or by over-estimating safety arrangements. This can widen the gap between their imagined and actual risk control. This leads to the fourth issue, unwarranted confidence:

Hutchinson et al. previously argued that “some plans can be symbolically powerful yet functionally weak.” Said differently, some safety artifacts can exert a sufficient influence over the subjective beliefs of people, like how “safe” people believe work to be, even though that artifact may provide little direct functional influence over hazardous work.

Basically, the audit plays the role of reassuring the stakeholders that health and safety can be properly governed even if it might not be so.

The authors state, however, that while counterfactual reasoning can be useful to uncover countermeasure against these failures, they still do not explain why people did what they did (which as far as I can tell is useful to actually find effective countermeasures). They also add that while the initial data set was large, the final number of adequate samples was somewhat limited.

They conclude that organizations may want to clarify their audit expectations, and implement mechanisms to test and monitor whether their auditing does achieve these expectations, particularly to prevent silent failures. They warn that audits may not act as robust indicators of weak or early signals, and ask that practitioners ask whether they work as tools for problem-solving or as a machinery of comfort-seeking.