Paper: The Alarm Problem and Directed Attention in Dynamic Fault Management

2022/12/09

Paper: The Alarm Problem and Directed Attention in Dynamic Fault Management

These are notes I've taken about another paper from David Woods, The Alarm Problem and Directed Attention in Dynamic Fault Management.

It concerns mainly what it dubs "the alarm problem", what is essentially the property of many alarms to be nuisance alarms (when something is not a false alarm, but reports on a condition that is not considered threatening, like a smoke detector beeping when you slightly overcook your toasts), alerts that have messages that are either ambiguous or underspecified, alarm inflation (proliferation I assume?), and alarms that give you an update on system status rather than pointing out anomalies. There are more bad behaviors, but those are salient ones.

The paper warns that the time periods where the alarms are densest are also the same time periods where the cognitive load and the criticality of task is the highest on practitioners. It's during that time that alarms are supposed to help, but if they're poorly designed they'll instead distract and disrupt important tasks. The approach he pushes instead is one where the alarm system is seen as an agent part of the socio-technical system (humans and machines) and that attempts to direct the attention of human observers.

This perspective is important because rather than seeing the alarm as something that just tells you about important things accurately or in a timely manner, it becomes an overall cognitive task based around attention control, which becomes very contextual and needs to consider demanding activities:

A critical criterion for the design of the fault management systems is how they support practitioner attention focusing, attention switching and dynamic prioritization.

The critical point is that the challenge of fault management lies in sorting through an avalanche of raw data -- a data overload problem. This is in contrast to the view that the performance bottleneck is the difficulty of picking up subtle early indications of a fault against the background of a quiescent monitored process. While this may be the bottleneck in some cases, field studies of incidents and accidents in dynamic fault management emphasize the problem of shifting attention to potentially informative areas as many data values are changing.

[...] Shifting the focus of attention in this context does not refer to initial adoption of a focus from some neutral waiting state. In fault management, one re-orients attentional focus to a newly relevant event on a different data channel or set of channels from a previous state where attention was focused on other data channels or on other cognitive activities (such as diagnostic search, response planning, communication to other agents). Dynamic fault management demands a facility with reorienting attention rapidly to new potentially relevant stimuli.They consider the control of attention as a skill that can be developed and trained, but also one that can be undermined. They also consider alarm signals as a message to direct attention to a specific are topic, or condition in a monitored process. The receiver must in turn quickly evaluate (from partial information) whether to direct attention away from whatever it is they are paying attention to.

This creates a sort of contradictory position where you want to provide information, and that information needs to be processed to define its importance, but mostly because we want to know if it requires attention or not. But evaluating requires some sort of attention already. In order to tackle that, let's break down the parts of the equation.

First, an attention-directing signal ("look at this!") acts as a referrer. Its influence depends on the information it provides on a) the event and condition it refers to, and b) the context in which it happens. There's also some value in knowing about why the system thinks this event or value is meaningful.

Second, the concept of directed attention is inherently cooperative. One agent has to have some awareness of where the other agent's attention is, what it is they're doing, and without explicit communication.

Third, for the communication to be effective and not too demanding cognitively, the attention-directing signal must use a "joint reference"—meaning an external representation of a process and its state. You can talk about a known service, a known status, a given operation, and do so effectively. If you're referring to something entirely new and never seen before, you don't actually use a reference, you have to give an explanation and this is costly.

Fourth, attention management requires the ability to manage signals: enqueue them, bundle them, ignore them. This brings us to our contradictory position where this management to know what to drop and ignore requires not ignoring the signal.

Making this work requires something Woods describes as a preattentive process:

It is important to see the function of preattentive processes in a cognitive system as more than a simple structuring of the perceptual field for attention. It is also part of the processes involved in orienting focal attention quickly to “interesting” parts of the perceptual field. Preattentive processes are part of the coordination between orienting perceptual systems (i.e., the auditory system and peripheral vision) and focal perception and attention (e.g., foveal vision) in a changing environment where new events may require a shift in attentional focus at indeterminate times. Orienting perceptual systems are critical parts of the cognitive processes involved in noticing potentially interesting events and knowing where to look next (where to focus attention next) in natural perceptual fields.

To intuitively grasp the power of orienting perceptual functions, try this thought experiment (or better, actually do it!): put on goggles that block peripheral vision, allowing a view of only a few degrees of visual angle; now think of what it would be like to function and move about in your physical environment with this handicap [...] [T]he difficulty in performing various visual tasks under these conditions is indicative of the power of the perceptual orienting mechanisms.

In short, we're already quite good as humans at doing that sort of non-focused pre-processing and organization of data to help filter and pre-direct where to give attention next by choosing which part of the data space to focus on. An alarm designer must therefore try to build mechanisms that support preattentive processes, to strike a "balance between the rigidity necessary to ensure that potentially important environmental events do not go unprocessed and the flexibility to adapt to changing behavioral goals and circumstances."

For this to work, your preattentive signal needs to:

be capable of being picked up in parallel with other lines of reasoning
include partial information on what it refers to so the observer knows whether to shift attention or not
assessing the signal and partial information must be cognitively light enough that it doesn't interrupt ongoing reasoning

An example of this sort of thing was found by accident with the control rods in nuclear power plants. The position of rods was indicated by a mechanical counter, which created an audible "click" when the state of the system changed. If the rods moved faster, the clicks also went faster. A similar signal also existed with boron concentration in coolant fluids (and you may imagine hearing "how close to boiling" water is from this sound). It turns out that this let the nuclear power plant operators handle control rods in parallel with other (primarily visual) elements, and notice if the system was changing or fixed. In the end, this could create a background "normal" state where operators could pick up variations and departures from expected states.

Older analog alarm displays (annunciator displays) had some good properties for this as well. Annunciator displays had a fixed array of tiles, fixed in space on a board. When a change or event would happen, a tile would light up. While this had a lot of weaknesses, one of the advantages was that experienced operators could end up picking patterns where specific alerts or group of alerts would light up specific physical locations, and so they could have an idea of what was going on from peripheral vision alone. If you put related elements together, then parts of the board gained a better spatial organization.

It's important to point out that preattentive processes are not conscious decisions or judgment but is a sort of recognition-driven process. A key factor is that you can coordinate with focal attention in existing processes around perceptual fields to help attention management.

But it's not sufficient to just plop something in your peripheral vision. Bad alerts only tell you "something is wrong here." As pointed out earlier, this is an underspecified alarm because good ones refer both to a state/event/behavior and to a reason why the signal is interesting. An example of this came from the study of computer displays that used icons representing processes, which changed hues when anomalies were detected. In dynamic settings, a fault tends to come with a cascade of disturbances, which meant alarms would tend to come up in groups, which would hide changes:

The hue coded icon display provided very little data, forcing the operator to switch to other displays as soon as any trouble at all occurred in the monitored process; in other words, it was a data sparse display. Field studies support this result. Practitioners treat systems with uninformative alarm systems as if there were only a single master caution alarm.

This can generally be improved by finding ways to increase the informativeness with partial information that can be more rapidly evaluated.

Another issue comes with nuisance alarms, which often highlight conditions that may be anomalous but turn out to be expected in the current context. These tend to require more intelligence/awareness in the alarm system about the ongoing context:

Alarms should help link a specific anomaly into the larger context of the current activities and goals of supervisory agents. What is interesting depends on practitioners’ line of reasoning and the stage of the problem solving process for handling evolving incidents. [...] [T]he context sensitivity of interrupts is the major challenge to be met for the development of effective alarm systems, just as context sensitivity is the major challenge for developing solutions that treat any data overload problem

Variations and change are the norm, so the authors recommend focusing on differences from a background, or departure from normal functions and models of expected behaviour in specific contexts.

Other suggestions include finding representation of processes that can emphasize and capture changes and events. You may also want to take advantage of non-visual channels, or if none are available, peripheral vision channels. Specifically, analog graphical representations tend to be friendlier to peripheral access, along with spatial dedication (like with annunciator displays).

The authors conclude (after a lot of examples that I encourage people to look into if they want more info):

[A]ttentional processes function within a larger context that includes the state of the process, the state of the problem solving process, practitioner expectations, the dynamics of disturbance propagation. Considering each potentially anomalous condition in isolation and outside of the context of the demands on the practitioner will lead to the development of alarm and diagnostic systems that only exacerbate the alarm problem. [...] In aggregate, trying to make all alarms unavoidable redirectors of attention overwhelms the cognitive processes involved in control of attention [...] Alarms are examples of attention directing cognitive tools. But one must recognize that directed attention is only meaningful with respect to the larger context of other activities and other signals.