“Assume it’s Monday morning and you have just returned from a week-long vacation. One of your colleagues is giving you an update on their development activities last week. What information would you expect to be included in their summary?”
“How would you design metrics to automatically measure the input/output of a software developer in a given month? Why?”
1. Developers want to know about unusual events
In addition to status updates on projects, tasks and features, many developers mentioned the importance of being aware of unusual events. One developer described the ideal summary of development activity as follows:
“Work log, what functionality [has] been implemented/tested. What were the challenges. Anything out of the ordinary.”
This anything-out-of-the-ordinary theme came up many times in our study:
“We cut our developer status meetings way down, and started stand up meetings focusing on problems and new findings rather than dead boring status. Only important point is when something is not on track, going faster than expected and why.”
When we asked about what unusual events they wanted to be kept aware of, the developers described several examples:
“If a developer hasn’t committed anything in a while, his first commit after a long silence could be particularly interesting, for example, because it took him a long time to fix a bug. Also, important commits might have unusual commit messages, for example including smileys, lots of exclamation marks or something like that. Basically something indicating that the developer was emotional about that particular commit.”
Another developer added:
“Changes to files that haven’t been changed in a long time or changes to a large number of files, a large number of deletions.”
Based on this feedback, we have started working on tool support for detecting unusual events in software projects. A first prototype for the detection of unusual commits is available online [demo, source code, paper]. We are in the process of expanding this work to detect unusual events related to issues, pull requests, and other artifacts.
2. Developers with more experience see less value in using method names or code comments in summaries of development activity
In the questionnaire, we asked about a few sources that could potentially be used to generate a summary automatically. The titles of issue (opened and closed) received the highest rating, while method names and code comments received the lowest. When we divided developers based on their experience, the more experienced ones (six or more years) ranked method names and code comments as significantly less important compared to less experienced developers. We hypothesize that these differences can be explained by the diversity of activities performed by more experienced developers. While junior developers might only work on well-defined tasks involving few artifacts, the diversity of the work carried out by senior developers makes it more difficult to summarize their work by simply considering method names, code comments, or issue titles.
3. C developers see more value in using code comments in summaries of development activity compared to other developers
Another statistically significant difference occurred when we divided developers based on the programming languages they use on GitHub. Developers using C rated the importance of code comments in summaries significantly higher than developers who do not use C. We hypothesize that this might be related to the projects developers undertake in different languages. C might be used for more complex tasks which requires more meaningful code comments. No other programming language resulted in statistically significant differences.
4. Many developers believe that their input/output (i.e., productivity) is impossible to measure
When we asked developers to design a metric for their input/output, many of them told us that it’s impossible to measure:
“It’s difficult to measure output. Simple quantitative measures like lines of code don’t convey the difficulty of a code task. Changing the architecture or doing a conceptual refactoring may have significant impact but very little evidence on the code base.”
While some developers mentioned potential metrics such as LOC, the overall consensus was that no metric is good enough:
“Anything objective, like lines of code written, hours logged, tags completed, bugs squashed, none of them can be judged outside of the context of the work being done and deciphering the appropriate context is something that automated systems are, not surprisingly, not very good at.”
One of the main reasons for not measuring developer input/output is that metrics can be gamed:
“Automatic is pretty challenging here, as developers are the most capable people on earth to game any system you create.”
And many metrics do not reflect quality either:
“A poor quality developer may be able to close more tickets than anyone else but a high quality developer often closes fewer tickets but of those few, almost none get reopened or result in regressions. For these reasons, metrics should seek to track quality as much as they track quantity.”
5. Developers with more experience see less value in measuring input/output with LOC, bugs fixed, and complexity
We asked about several potential measures in the questionnaire, including lines of code (LOC), number of bugs fixed, and complexity. Developers with at least six years experience rated all of these measures as significantly less suitable for measuring input/output compared to developers with up to five years of experience.
6. Web developers see more value in measuring the number of bugs introduced compared to other developers
7. C developers see LOC and complexity as more suitable measures for development activity compared to other developers
On the other hand, the measures of LOC and complexity were seen as significantly more suitable by developers using C compared to those who don’t use C (on GitHub, at least). We hypothesize that this difference is due to complex programs often being written in C.
8. Developers think textual summaries of development activity could be useful, possibly augmented with numbers
Developers who talked about the difficulty of measuring development activity generally felt positive about the idea of summarizing development activity:
“It’s dangerous to measure some number & have rankings. Because that can be easily gamed. I think having summaries of what everyone did is helpful. But ranking it & assessing it is very difficult/could encourage bad habits. I think it’s better to provide the information & leave it up to the reader to interpret the level of output.”
Numbers might be used to complement text, but not the other way around:
“I think that’s probably the better approach: text first, and maybe add numbers. […] I spend about 45 minutes every Friday reviewing git diffs, just to have a clearer picture in my mind of what happened over the week. […] The automatic summary would make it harder to miss something, and easier to digest.”
Next steps & follow-up survey
In addition to testing the various hypotheses mentioned above, we are now in the process of designing and building the tool support that the developers in our study envisioned: A development activity summarizer that reflects usual and unusual events, supported by numbers that are intended to augment the summaries instead of pitting developers against each other. Please leave a comment below if you’re interested in this work, and consider filling out our follow-up survey on summarizing GitHub data.