Paper: Four Concepts for Resilience Engineering

2023/09/30

Paper: Four Concepts for Resilience Engineering

Here is an interesting little bit from a novel I read during the summer of 2022, which had a quick note about the term “resilience.” I’m translating loosely from French:

Term borrowed from metallurgy, appropriated by pop-science psychiatrists and countless mediocre motivational speakers, resilience is one of the most overloaded words of this era. Synonymous with a capacity to overcome obstacles and to grow despite adversity for the common man, resilience rather points to the quality of materials that can return to their original form after having been hammered, burnt, twisted, or put under some tension.

To apply to humans while respecting the original etymology, we must first abandon all notions of naive optimism. The psychopath who maintains his psychological rigidity while interrogated is resilient, the drug addict who finds himself still tolerant to drug effects after a forced withdrawal is resilient, the soldier who lets himself be be showered by enemy fire in a lost battle is resilient; those who show resignation are more resilient than the optimists. It is therefore not a question of reaching for the higher planes of virtue, but to be unwavering to your true nature. Mother Teresa and Adolf Hitler both represent excellent resilience models.

The novel is Ta mort a moi, and the tone is for sure cynical, but I did enjoy the heavy pessimistic contrast with resilience as used in resilience engineering, and the idea that evolving and adapting is very different from resilience, which is just about returning to your original shape regardless of whether it is good or not.

So how does resilience engineering define resilience? Well that's this week's paper, once again by David Woods, titled Four concepts for resilience and the implications for the future of resilience engineering. The paper opens by admitting that the popularity of the term has led to confusion regarding what it means in the first place. I recall seeing other papers which held the ill-defined term as one of the biggest weakness of a discipline named after it. All the different uses seen around the place have been categorized into 4 groups by Woods: rebounding, robustness, graceful extensibility, and sustained adaptability.

Rebound

Why do some communities, groups, or individuals recover from traumatic disrupting events or repeated stressors better than others to resume previous normal functioning? Most research there asserts that the difference comes from which resources and capabilities were present before the disruptions, not from the what happens when surprised. The paper quotes:

the ability to deal with a crisis situation is largely dependent on the structures that have been developed before chaos arrives. The event can in some ways be considered a brutal and abrupt audit: at a moment's notice, everything that was left unprepared becomes a complex problem, and every weakness comes rushing to the forefront

A second important aspect is that research focusing on rebound cares a lot more about the fact that disruptions are surprises, rather than the nature of each individual disruption's characteristics. The surprise challenges a model and forces revisions into the system.

This creates a weird effect where this structure of research drives towards studying another definition of resilience (graceful extensibility): to deal with disruption, the capability to adapt has to already be there, and considers the resilience as a potential. But you can only measure the potential by validating it across disruptions, which this definition doesn't like focusing on.

In short, a lot of questions about resilience are about why or how organizations rebound, but the research has mostly moved on to study systems where there is an ongoing and continual ability to adapt and adjust.

Robustness

This is generally perceived to be a conflation of resilience with another term—the ability to absorb disruptions—robustness. More robustness means your system can tolerate a broader range of disturbances while still responding effectively. Generally, robust control works, and only works, for cases where the disturbances are well-modelled.

This definition therefore remains sensitive to the question about what happens when the disturbance is outside the scope of what was modelled. The typical failure mode here is one where the system reaches its limits and suddenly collapses. Woods states that brittleness tends to just live at boundaries of robustness. Cybersecurity is an interesting domain here where you can be extremely robust to specific types of threats, but once the attack is novel, using a different approach, everything goes bad.

The naive understanding of robustness is that you can continuously expand the envelope of the stressors you can cope with. In practice, empirical research has shown that it is in fact more often a tradeoff: the things you can handle mean there are other things to which you become more fragile. This, once more, pushes towards the two latter definitions, which focus more on ways to adapt than ways to predict, because that tradeoff is more and more considered to be fundamental and unavoidable (think, for example, of heuristics and limits to attention).

Graceful Extensibility

Graceful extensibility is a sort of play on the idea of graceful degradation. Rather than asking the question how or why do people, systems, organizations bounce back, this line of approach asks: how do systems stretch to handle surprises? Resources are finite, environments changing, and their boundaries shift in ways that requires stretching and elasticity. A tenet here is that without the ability to stretch and adjust, your brittleness is far more severe than expected during normal operations, and generally exposed through extremely rapid collapses.

So a big question is where's the boundary? We never know, incidents define it. There's a rate and tempo to events that let us get a glimpse of what it might be, so they can be looked at, tracked, and exercised. A common challenging scenario is how an organism that deals with "normal" challenges deals with two of them happening at once, for example, because this risks overextending the system.

The idea here is influenced by Safety-I (studying and preventing failures) vs. Safety-II (studying and enhancing successes), such that graceful extensibility can be seen as a positive attribute: how do we create a readiness-to-respond that is a strength and can be leveraged in all sorts of situations, rather than narrowing it to being the avoidance of negative effects?

Contrasted with rebounds, the approach to this is to look at past challenges, and see them as a way to gauge the potential to adapt to new surprises in the future. It also allows the idea of studying sequences and series of rebounds on a longer-term view of the system. How do they succeed and how do they fail?

Systems tend to fail when exhausting their capacity to mobilize response as disturbances grow and cascade, something dubbed decompensation. This tends to be detected when the ability to recover from a crisis takes longer and longer, which is the impending sense of a tipping point or collapse. The positive version of it is the anticipation of bottlenecks and crunches, and being able to deal with them. There are things that can be done to aid this resilience potential, but it contains its own challenges, where an organization can hinder its own capacity while trying to improve it.

This leads to the fourth definition, sustained adaptability

Sustained Adaptability

This refers to the ability manage/regulate adaptive capacities of systems. In short, while the past can be used to calibrate the potential for future resilience, the past is also not predictive and you can hit walls where the capacity is gone. Resilience-as-sustained-adaptability asks 3 questions:

what governance or architectural characteristics explain systems that succeed or fail at sustained adaptation?
what design principles and techniques would allow one to engineer a system that adapts in a sustained manner?
how would you know you're succeeding?

Expected challenges to sociotechnical systems over their life cycle include:

surprises will keep challenging boundaries
conditions and contexts will keep changing and shifting the boundaries
adaptive shortfalls will happen and people will have to step in
the factors that provide adaptability and the needs for them will shift over time
classes of changes will happen and the system as a whole will need to readjust itself and its relationships

A whole lot of the discipline is therefore interested in all the tradeoffs people make, that biological systems (or ecosystems) make, and particularly which are fundamental and how they apply to other systems as well. An agenda of this type of resilience is in managing capacities dedicated to resilience. In this perspective, it makes sense to say a system is resilient, or not, based on how well it balances all the tradeoffs, or not.

Woods states that the yield from the first two types of resilience has been low. The latter two approaches, the most positive ones, tend to provide better lines of inquiries, though the discipline is still young.