Poll Results: Erlang & Maintenance
A bit over a week ago I started a poll on Erlang and maintenance, the idea being to figure out what the Erlang community thinks helps them maintain software written in Erlang. I've promised to publish the results and let everyone in the community see them, which I'm doing right now.
The poll was conducted using google docs, anonymously. It got 169 responses, and was advertised over the Erlang mailing lists, the #erlang IRC channel on freenode, my own twitter account, and wherever people might have forwarded it to after that.
Even though the poll was anonymous, I figured it would be interesting to get a little portrayal of who answered it to figure out how useful the data is. If only newcomers answer it, then it doesn't necessarily represent the well established industry. Same for the opposite.
Hence, the first question was about how much experience people thought they had:
I have to admit that 'Beginner' and 'True Professional' are hard to quantify. They're abstract and there's no way to say if someone takes the literal 'professional' definition (earning a living with it) or went for the more abstract grey beard meaning. Thus, it's hard to say if most people who answered did so by the idea that they earn money, that they're all truly competent, or that they are under the spell of the Dunning-Kruger effect. In any case, the two former interpretations would seem to match the results from Hammer Principle where people tended to rate Erlang as a language they learned late in their career, lending credence to Erlang programmers being pros. Other preliminary questions also lead me to think it can be right. Of course that's what we Erlang programmers would want to believe!
The next question asked participants what kind of maintenance they had done in the past. Taking maintenance advice from someone who worked on legacy systems might give different results from someone who only ever worked on their own code, so this felt like a natural one to ask:
Note that people could pick more than one option in this poll. The definition of legacy system was “large software systems that we don’t know how to cope with but that are vital to our organization”. While 99% of the responses included new software, 49% had legacy systems maintained.
Then came the questions about Erlang itself. How well did the respondants evaluate their knowledge of it, excluding OTP?
And the related question regarding OTP:
While self-evaluation isn't the easiest thing to do, I did prefer it to arbitrary tests (which already exist for those interested). We can see that most people tend to say they know Erlang and OTP rather well, but generally feel more confident about their knowledge of Erlang than OTP. Considering the relative complexity of releases and relups in the OTP world, this isn't too surprising.
The last question for the background info part of the poll was about what uses people made of Erlang:
Here again, people could pick more than one option. The most surprising aspect of this is that a large proportion of people use it for work, compared to what discussions in the community would suggest. 72% of the people who answered the poll use it for work. It's in fact one person more than those who use it for toy projects (122 to 121). If anything, this should be a decent additional data point to the usual 'toy language' argument heard by members of the community when discussing Erlang. A few worthy mentions (but not statistically significant enough to make it into the graph in a category other than Other) included academic usage.
Important Factors in Maintenance
The core of the poll was based on the following maintenance scenario: You are hired to take over an existing Erlang code base. Based on your experience, what do you think is important for you to feel comfortable and take 'ownership' of the code?
Different elements were then graded between "It makes things worse", "I don't care", "It's good to have", and "It's essential". Here are the results:
The entries have been sorted by most important to least important, given weight ratios of -1 for 'It makes things worse', 0 for don't care, 1 for good to have, and 2 (or 3) for essentials. The ordering is mostly unchanged by adding more weight to essential categories.
In any case, it's unsurprising that source control is seen as a basic, essential need for most developers. It's also unsurprising that developers ask for time to adapt from their managers. After all, 22.5% to 57.6% of the time spent in any software project is spent trying to understand the system. According to “Software Maintenance” by Gerardo Canfora and Aniello Cimitile, it's 50% to 90% of the maintenance time that's taken trying to understand the system.
The next point is something I'm somewhat surprised to see. OTP behaviours (gen_server, gen_event, gen_fsm, supervisors, OTP applications) are seen as something more essential than tests, documentation, knowing the problem domain, and everything else on the list you can read yourself (although I doubt most developers would trade all of them at once for OTP!) I expected people to consider OTP behaviours as important, but not that important. The cat is out of the bag: if you want to have people working on your projects, doing it the OTP way is a must, although you can take your time regarding releases and relups.
A few other interesting things are:
- People maintaining your software care more about high level documentation than module documentation and stuff inside the code.
- While relatively few people consider knowledge from coworkers or architects as essential, it's something people think of very positively.
- Out of the 10 first categories by importance, 9 of them are not technically specific to Erlang or OTP (tests are a grey area as it mentioned specific frameworks). Only the use of behaviours is platform-specific in there, but it's a very important one.
- Conversely, out of the last 4 categories, only one (comments in code) is unrelated to the platform.
- All items are seen as positive by the majority of developers. What changes is how strong the positive feeling is.
More insights by users
The last question of the poll was the open ended "Any other factor you feel is very important?". Here are a few of the more common results, in no specific order:
- The presence of a code standard for an idiomatic style, helping with readability.
- Short functions.
- Using descriptive names (for functions, modules, processes, variables, etc.)
- Documentation or a process on how to build, test, and deploy the system, ideally error-free.
- Well-defined compartmentalization of code, modularity.
- A general vision of the project timeline: its history, where it's going; continuity.
Before finishing this thing, I decided to play with the data a bit until I found interesting correlations.
This might not be too surprising, but I still found this one funny. The more Erlang you know, the less you tend to rely on comments. They can even end up being seen negatively! This might mean that the more experience you have, the easier it is to perceive code as self-descriptive, but this is assuming we have causation, not just correlation.
This shows a correlation between how much knowledge of OTP one has, and how important OTP behaviours are seen when maintaining systems. If anything, this shows that people who learn OTP don't tend to regret their investment. They want more projects that use behaviours. On the other hand, you could also interpret this to say that programmers who understand OTP in-and-out tend to be pickier about what they wish to maintain.
This is the most interesting correlation to me, especially when we join it with the one right before. This shows that when people know OTP very well (and see it as essential), they tend to estimate that they need less time to adapt to a project. Similar correlations couldn't be found with reported developer experience or regular Erlang knowledge, so believing that there is a causation between the two is tempting.
If I'd have to give a personnal guess to explain it, I'd say that the common structure of OTP behaviours and applications tend to give some stricter modularity constraints to a system, on top of wrapping common behaviours under very well known patterns. This makes it easier for the developer to dive in a code base, play around it, and modify it without requiring a deep knowledge of all the other components of the system. This would reduce the pressure that usually comes with shorter deadlines. Then again, if this is purely correlation and there is no causation, even trying to explain this is moot.
I want to play with the data
Sure thing. You can grab a copy of the LibreOffice calc sheet I used and play with it. If you find that my interpretations have been wrong, please send me an e-mail (there's a link at the bottom of the page) to let me know.
There isn't much to say. I'd like to thank everyone who participated. It would be interesting to see someone with a more formal mind than mine try to see what they can do to learn more from the community than I did.