My bad opinions

2021/03/19

Surfacing Required Knowledge

Disclaimer

When I first started this blog, I had it in mind that I'd only write about things that were discrete and verifiable. I wanted to avoid doing the bullshit calls of being an Intellectual in Silicon Valley, or say to casting entire industries through one reductive lens (coming to mind is the old post from Steve Yegge on the now defunct google+ about framing software engineering as political affiliation), or god forbid, writing a blog post about "how to be a good <role>" which transparently is a text that just says "here's how to be more like me."

Unfortunately, this text is about to break a crapton of these early rules. If at any point you feel "gosh this sounds like SV-style thought leadership", please feel free to kneecap me on twitter to get me back into a reasonable place.

I've learned that it's sometimes useful to take a specific lens under which we can analyze dynamics and see where the idea pushes us, how it surfaces new perspectives, with some of them being useful and worth keeping around. Social sciences do that stuff under ideas of gender or race studies, for example, and semiotics do it around signs and how they communicate information to interpreters. Other lenses could include things such as studying social graphs and communication patterns, or advice like "follow the money."

In this post, I'm taking such a lens and trying to apply it to my experience in the tech industry to see how well it explains a few things. My perspective is going to be limited, this post is de-facto going to contain a lot of bullshit that will not ring true to readers, and I will try and be careful to keep this as a way to consider alternative viewpoints to add perspectives, rather than as a new framework that ignores the existing richness of things to provide a blunt high-level metric that causes more damage than anything. It's more than likely that all of that stuff already exists and is better studied than what I came up with as well, in a discipline that existed longer than I've been alive but that I'm unfortunately unaware of. There's also a bunch of it that's just me digesting ideas from smarter people and re-wording them clumsily.

Also final words of caution: I am going to arbitrarily use terms like "knowledge", "expertise", "experience", and "skills", and do so loosely and interchangeably. I am also going to do the same thing with terms like "education", "training", and "teaching". I am using these terms to refer to a general concept of attributes we believe people need to perform up to our expectations and the means by which we can create or augment these attributes in people.

Externalizing Required Knowledge

One of the good presentations I've seen people refer to a lot has been the one at boringtechnology.club, Choose Boring Technology. The article always made a lot of sense to me, but also annoyed me because I like Erlang a lot, and Erlang is not boring technology. It is a non-commoditized ecosystem where you can't easily reach out and grab 50 senior devs for market prices. On the other hand, I have also seen how trying to do polyglot organizations right can highlight a lot of blind spots that exist within an organization that feels safe using a single language. I've also seen cases where people stretched PostgreSQL in ways that were extremely uncomfortable, and where the same kind of wizardry was required for a mainstream tool as a niche one, and you couldn't easily hire 50 senior devs at market prices for the skill set required either.

Companies often choose technologies with the hope of easily finding a workforce for it, one that is commoditized. Put the job ad, get seniors with 5+ years of experience on 3 years old technology. Nobody knows where or how they were trained. They're just there, ripe for the picking, part of the environment. Countless companies would rather spend extra months looking for the proper type of seniors with the proper ability to answer the right whiteboard questions—which mostly suck at predicting how good they'll be on the job—than they would trying to train their employees up to the level they expect.

On the other end of the relationship, workers enter a perpetual rush to stay up to date on specific technologies, with the hopes of remaining employable. People are stuck playing a divination game in hopes to bet on the right stack with the right skill set such that they'll check all the boxes in a job ad that sounds like a Christmas wishlist in a letter to Recruiter Santa. Where do developers find the time to keep up to date? Free time, mostly. They do it at night, or maybe on stolen time from employers. A few are lucky enough to pick it up as they go while paid for it as their main duty, usually by their job accepting them to be slower while they figure shit out with a book they've been given to help them along the way.

This cycle of building on newer stacks all the time to have a more easily hireable workforce whose concern is to always learn newer stacks to remain employable accelerates in a painful way, churning through burnouts and stacks. People avoid roles in stacks that are seen as less trendy because they're fearing it will hold them back, and yesterday's new hip thing is now the radioactive legacy of tomorrow.

The ever-growing wage inflation in tech, which has been going on for way longer than I'd have expected, likely does not help either. Average tenure at most organizations remains short as people know and expect to get far more significant raises from switching roles than waiting for internal promotion or salary adjustment cycles. Hell, it's likely hard for many to get an adjustment of their income that matches the rate of inflation when keeping their job, but easy getting wage increases in ranges as high as 30%-70% by switching roles. This is compounded when many employers have salary ranges attached to seniority ladders, and that to keep hiring new talent at market rates without raising existing payroll costs, people with less experience get hired at more senior levels while your more senior people's levelling slows down. Hopping jobs becomes a baseline strategy in the industry, and people who are happy in their roles can see themselves at a severe disadvantage in terms of opportunity for sticking around too long.

It feels unsustainable. It wasn't always that way, and still does not need to be so. I can't comment on the feasibility of more aggressively giving raises to your current staff to keep on with market rates for most employers in order to retain talent. On the other hand, there are strategies that can be adopted to better cope with an expected churn in your workforce. These strategies are useful at all sizes; large corporations with significant workforce and complex hiring efforts, and growing startups whose history is held in mind by an ever-shrinking faction of veterans.

On-the-job training, where an employer takes on the duty of training the people they hire, used to be one of the most popular ways of doing things, usually with the help of mentorship and apprenticeship. Companies that historically were truly innovative actually had no choice but to work this way. If they were absolute leaders, they were the only ones able to show others how to do things for them. You couldn't expect to reach excellence without having ways to foster it; otherwise what you could do is hire the offshoots of places that knew how to do it, and follow behind them like seagulls trailing a fishing boat.

On-the-job training brings up images of taking people unskilled in your domain off the street, giving them equipment, and making them good. It feels onerous, long, and ineffective. Nevermind that training or teaching well is a skill that isn't always aligned with a person's main role. That being said and without judging on the possibility of doing it all-or-nothing like that, it brings up the possibility of picking arbitrarily high requirements:

Those are often what companies aim to screen for when they hire, as made explicit by the lists of requirements in their job ads. They also pile on additional requirements around education, ambitions, "culture fit", experience, domain knowledge, ability to perform under pressure, and so on. My point is: when hiring someone, we set a base level we expect, and then commit to leaving the new hire enough time to ramp up and close the gap, or more ideally we commit to training them until they reach or exceed expectations.

When we raise the bar of hiring without changing anything else, we tacitly externalize the cost of levelling up people to the ecosystem at large: universities, open source communities, bootcamps, competitors, and other organizations within the industry. We expect the experience and expertise to be gained elsewhere, hopefully saving us the costs in the process.

So here's the new lens: what would be required of organizations if we wanted to actually lower the bar of hiring, and we took it on ourselves to close the gap between what we expect of high performers and the situation they're in when we hire people? Could we quantify and qualify that?

Surfacing Required Knowledge

A scary aspect of this externalization is that—like a lot of things we externalize or commodify—our dependencies become invisible. Open source software is a bit the same: those who treat it like a free buffet without regards to the sustainability of libraries they use can suddenly become surprised when the external actors (the maintainers) vanish or move on to other things. Two reactions feel safest:

  1. You use only things that are safely externalized to avoid paying any further cost
  2. You start participating in their ecosystems to make sure they remain sustainable.

To me, this is the rough edge to "use boring technology". Using boring technology sort of aligns you with safer externalization of software, but it reaches its limits when you start stretching common pieces of technology when your use cases are not aligned with their intended or most typical uses. Is it better to take a given database you know already and use it in weird ways only your team knows, or to add a new different storage mechanism that is used the way everybody else uses it and for which you will have lots of resources? At some point, the choice is not so obvious.

That choice is simple enough for open-source work, but it's not really an option for education and expertise. Some of the things your organization does are only going to be known by your organization. This includes domain knowledge, but also understanding your own software and organization's history. These things turn out to be far more significant than expected, and if you are not aware of it or taking means to manage it, you're left using a haphazard strategy, in a position fragilized by surprises.

If you're providing software as a service or operating a platform for your customers, one of the most revealing questions to ask is "how could we let customers run this on-premises?" Take your whole stack, and assume you're shipping it to someone whose workforce does not have your own team's experience and knowledge. They have to run it, operate it, apply updates, everything. How would you close the gap? For most places I've worked at, this was generally unthinkable: there's too many skeletons, too many gotchas, bits we're not proud of but are still there, complex interactions you can't manage well without deep knowledge of internals, and so on.

A system is its rolled up history of adaptations to its experiences and expectations of stressors and pressures.
    - David Woods

You are dependent on a living experience embodied by your team, and it's near impossible to divorce your system's ongoing success from its people. The few places that can afford to say they can usually have built everything with the expectation that operators are going to be external, and the knowledge required to run things will need to be made explicit in order to let others run it all.

Ask similar questions about your staff: if a given employee were to leave tomorrow with no knowledge transfer, how much trouble would we be in? Can we simulate that by making them take surprise paid vacation for a week? What would come up? Is there glue work they do and we don't realize?

Are there parts of our system where we don't really know what good results are supposed to look like, and we're mostly relying on things keeping on keeping on the way they are? Who knew what "good behaviour" for the system was supposed to be when we added these features, and what happened since then? What are some good war stories in your group, and how are they being passed on?

These questions are sure to identify conceptual gaps where we have hard dependencies on what our people know but are very likely not tracking it for real nor preparing to transfer that knowledge, what do we do? I can think of two families of approaches: one that is structuring and explicit, and one which is about fostering the right conditions for things to happen.

An explicit structuring approach would have you map out all the things we need, or at least the important ones. Draw out lists of requirements, find the odd ones out you never really use or no longer have. What did the last person we had who knew how to run the legacy stack know? What are the weird things we do to keep things running? Some of these questions can be asked directly. These will likely have to do with the things we know we need when things go well: how we write code, test it, build it, deploy it, look to know things are fine. Which dashboards or queries or logs people look at. They're generally the things you might put in your on-boarding documents, but they rot fast.

Your tell-tale signs of that would be having internal documentation (that is up to date), internal bootcamps, a library of presentations and tutorials to help people level up, the equivalent of game days and simulator hours in airline pilots where you can train and get acquainted with all the complexity as part of you joining in. This has a cost, and the weight of this structure creates a rigidity that can limit your own adaptiveness; you need to be able to undo and adjust all of that material constantly to keep it useful, rather than treat it like precious ruins no one should disturb.

For many of the explicit structured approaches, you won't get the important stuff by just asking. Most of it is tacit knowledge that only shows up when things break. And then you will see your local experts identify the failure mode, reason about how it happened, and find ways to correct course for the system and make things work again. That is the sleeping expert knowledge that gets invoked in times of need but that we otherwise never recall explicitly. It requires time and lots of observations to catalog that stuff.

Fostering Expertise

I've mentioned earlier that I'd be loose with terminology. Here it makes sense to force a distinction between knowledge and expertise. Knowledge can be the things you know such as facts and strategies, context and history around decisions. They're somewhat easy to transmit between people. I'll define expertise as trickier, and relying on experience to easily and correctly make use of knowledge on a contextual basis. It's the difference between knowing the rules, and when to break them. It's figuring out what information not being there could mean, and how to adjust.

I believe that making information with which you build knowledge explicit can work for foundational stuff, but it's not going to be workable everywhere, and you will hit diminishing returns. Instead (or on top of some structure), there are habits you can take that will make knowledge transfer and skill improvement part of your culture in a way that is compatible with gaining experience faster as well.

Your people's expertise will show up when encountering novel situations that will challenge them. When shit hits the fan, you will find them finding ways to mitigate situations, buy time, re-evaluate the problem space, create and disprove tons of hypotheses, be surprised, and collaborate on all of it until a solution can be found. Those are going to be defining moments where teammates help each other level up.

Much like a capacity to adapt or good cardio, expertise is not something you have as much as something you do. Make sure everyone gets to exercise and be in contact with it. Walk the problem space and get people to talk to each other in ways that synchronize their mental models and experience. Provide ways for people to get quick feedback for their actions, both from mentors and from observing the results of their actions.

This can come from activities such as in-depth incident investigations (not "action item factories"), chaos engineering, code reviews that aim to spread ownership, apprenticeships, lunch and learns, "war stories", or other presentations for direct dissemination, and contrasting approaches that can bring broader perspectives. Good examples of the latter can be book clubs with discussions from people in similar roles but in other teams, or getting your customer support people on dev teams and engineering staff on customer support rotations.

These are concepts likely tied to symmathesy, about which Jessica Kerr has written great articles.

I suspect that both the structured knowledge and expertise fostering approaches work best in a pair when they can feed off of each other. Having them can show that the organization values learning and teaching, and increase internal mobility and adaptiveness. Use both, and turn them into a self-reinforcing loop.

Sustainable Externalization

Some stuff we won't be able to afford internalizing. There's a lot of complexity in our stacks, building our own corporate universities sounds less than ideal. In fact, the software industry is often seen as rather unique in how its workers act as a community, and I would hate losing that. Hillel Wayne's Crossover Project mentions:

We software engineers take the existence of nonacademic, noncorporate conferences for granted. But we’re unique here. In most fields of engineering, there are only two kinds of conferences: academic conferences and vendor trade shows. There’s nothing like Deconstruct or Pycon or !!con, a practitioner-oriented conference run solely for the joy of the craft.

What we do as a workforce is cope for the bad patterns of so many workplaces, and turn it into a strength. I'm someone without a formal computer science education that directly benefited from that openness, so I see it as rather critical. It also lets people go past the limits of what their local education system can offer, which might either be fantastic or terrible, and can act as a democratizing force.

However, implicit limits that come from "training on your own time" perpetuate structural privilege. The only reason I could do as much as I did without a formal education was that I was in a life situation that made it possible for me to learn a different language, work full time, study part time, write a technical book, and travel for work all at once. I've been fortunate in being in a context that isn't available to everyone: education, costs of living, dependants (children or less autonomous relatives), mental and physical health factors, work schedules and proximity, and all other sorts of factors where discrimination can come up can all be structural blockers that most people having to learn in their free time won't have to face on equal footing.

If we are to externalize knowledge and expertise to a broader ecosystem, I would advocate doing it sustainably: participate in the ecosystem, and make it so your people can do so during work hours. If you see a huge benefit from using the training provided by others (and their open-source code, while we're at it), find ways to take some of these savings and reinvest them to the community, even if it's just a tiny fraction of what you saved. Big industry players have already realized that their devrel and marketing pipelines can benefit from it, and smaller ones with a keen eye already are involved in local user groups.

Get your shoulder to the wheel and turn from consumer to participant, both for the sake of your employees, but also for the sake of your own sustainability.