Outputs over outcomes are so back

Who doesn't like engineering productivity improved?

Even if improved by a few percentage points, the cost savings are enormous! Here is what engineering leaders often overlook: how good is the data to support the sale pitch on cost savings?

Spoiler alert: in most cases, it's not nearly good enough.

Here is an example of how these engineering improvements are being sold.

Researches at Stanford, a very serious institution, offer an automated tool to "Measure Your Software Organization’s Engineering " (emphasis is mine RN)

Link: https://lnkd.in/es2qyJJp

The tool gets access to the GitHub repository, and with some metadata about an organization from the user, it evaluates productivity. There is no example report on the tool landing page, so I can't say anything about the results. However, the "Engineering Output" in the call to action on the website is pretty telling.

Stanford is a serious organization (10B operating budget for a year). So they back the tool with a research paper. The paper is "Predicting Expert Evaluations in Software Code Reviews".

If you read it, the paper focuses on automation methods for code reviews (in code commits). It makes reviews predictable and reduces subjectivity in the reviews. It also estimates the time and effort, the complexity, mailability, etc.

In very simple terms, it can produce expert reviews. And since it's automated, the idea for cost reductions is — you guessed it — to measure and act on these metrics.

If you've been around software development, you may already see problems with the approach "lets put this metrics into a feedback loop". The biggest problem is what to measure, and the authors picked the infamous one: "Software Organization's Engineering Output".

Still, tools are tools. They can be useful in the right context.

But with this specific tool and methodology, things got dicey when one of the researchers claimed that "9.5% of software engineers are ghosts," "they do virtually no work," and "their performance is <0.1 of the median engineer."

He even argued that large companies waste on "ghost engineers." Shall they cut 10% dead weight while reaping the benefits of "over 25% of new code written by AI," as Google's CEO said on an earnings call. A win-win, right?

Wrong!

Here is a good quote to use in a "the tool will fix it" situation: "Extraordinary claims require extraordinary evidence".

Engineering leads, please do not buy into the metrics. They are meaningless without evidence and context. Look instead into the underlying data and methods. Even hire somebody to do a "peer review".

Questions to ask:

1. Does it have a robust dataset? No. The study looked at 70 commits from 18 authors, in Java language only 2. Does the study justify its conclusions? No, it shows correlations but doesn’t explain the results 3. Does this work in different contexts? Unknown, it focuses on code commits only