The Tyranny of Metrics

Everyone has been held accountable for metrics that they didn’t own the results of. Whether it was sales, profitability, or some other metric, we’ve been subject to The Tyranny of Metrics. That does not necessarily mean, however, that all metrics are bad. On the contrary, we need good metrics to realize our best performance. In The Tyranny of Metrics, Jerry Muller seeks to help us sort out how to generate better metrics, so fewer of us face the tyranny and more of us have the opportunity to realize the power of metrics.

Feedback and Best Performance

In Peak, Anders Ericsson and Robert Pool explain that, to reach the pinnacle of performance, we need to exercise purposeful practice, and that practice needs to provide us some feedback about our performance. In Flow and Finding Flow, Mihaly Csikszentmihalyi explains how we reach our optimal state that he calls flow by having clear goals, immediate feedback, and a balance of challenge to skill. In short, our ability to get feedback on our performance directly relates to our ability to improve that performance.

With that as our foundation, let’s quickly explore what metrics we should be capturing.

Measurement and Value

In the development of peak performance, either in moments of flow or in becoming the best in a chosen endeavor, we’re looking for specific kinds of feedback that allows us to tune our activities. However, the kinds of metrics that are easy to get are often not that useful – and often the real metrics we want can be difficult or expensive to capture. Many have noted that we’re drowning in data but starved for information. (See The Information Diet for more.) This dynamic – that we have lots of easily generated data we don’t need and not enough of the information we want to make key decisions – is a reality of our age.

When finding metrics that are less likely to become tyrannical, we must balance what we can get easily with what will improve our ability to make decisions. The best metrics are low cost to acquire and of high decision-making value, but finding them is like looking for a needle in a haystack. We’re more likely to find easy to acquire metrics that have little value.

Cost of Acquisition

Alternatively, we may find that we want metrics that are of high value but for which we’ll need to invest significant resources. Consider for a moment a customer relationship management (CRM) system. A sales team is used to managing information about customers themselves but in a haphazard way. Mostly, sales professionals rely on their email as a record of their interactions with a customer, whether the interaction was face-to-face followed by an email summary or it was conducted solely via email.

Sales management wants to know how the sales team is doing, so they want to track the number of times the salesperson interacted with the customer, including the number of in-person visits, emails, and proposals generated. The effort needed on the part of the salesperson is recording these interactions in the CRM system – an activity they never did before. They perceive little value in this activity. It’s just reporting. They see no benefit in it, because they already have a way of keeping up with their customers.

In this case, the cost of acquisition is modest – particularly with modern CRM systems that allow the salesperson to track many of these things directly in the CRM system or by monitoring the salesperson’s mailbox. In this case, the work to capture the information for the metrics may be appropriate, but they’re not without cost, as you’ll find out if you ever implement a CRM system and listen to the resistance you get from salespeople about it.

Standardization

Invariably, one of the issues that comes up is how to fit the range of activities into the narrowly defined buckets that are necessary to do reporting. Consider that the CRM system might be implemented such that the contact types supported are email, in-person, and telephone. On the surface, this may make intrinsic sense. However, what do you do when you do a video conference with the customer? What if you’re in person, but you must conference in a member of the team who couldn’t travel to the customer? There are several variations on this theme that play out any time someone is asked to record their activities for measurement. The categories are necessary to make reporting possible, but they’re frequently frustrating to the people trying to put their activities into buckets.

If you need an example, consider the last automated operator system you encountered. Did you have to shoehorn your request into one of the 6 options you received at the first menu? That’s the feeling of uneasiness one might feel every time they record an activity and have a limited number of options.

Distortion of Metrics

Perhaps the biggest challenge with metrics can be distorted. Some of the distortion can be subtle and unintentional. If you’re looking for a decrease in crime, you may be tempted to code the severity of an incident lower – or forget to report it all together. Whether this is malicious or simply a case of natural bias is an academic discussion. The reality is that the values we get from a metric are necessarily distorted by the environment they come from.

The Tyranny of Metrics offers up eight ways that metrics are distorted:

Measuring the most easily measurable — at the expense of what we really want
Measuring the simple when the desired outcome is complex
Measuring the inputs (only) rather than outcomes
Degrading information quality through standardization
Gaming
- Creaming – only taking the cases that make the metrics look better
- Focusing exclusively on the metrics while ignoring the important but unmeasured goals
Improving the numbers by lowering the standards
Improving the numbers through omission or distortion of data
Overt cheating

Collectively, these distortions are a major challenge for designing a set of metrics that can be used to allow us to improve performance.

The Tyranny of Leading and Lagging

While Muller cautions us against measuring only the inputs and not the outputs, when we’re working with people, it’s important for us to include both. We need to measure the inputs in a way that people feel like they can change. Of course, we must also measure the outcomes to minimize the chance that the input metrics will be distorted rather than changed in ways that make a meaningful impact on the output metrics.

Said differently, we need indicators that people feel they can change, which should generally lead to the desirable business outcome. These indicators, called leading indicators, are the kinds of metrics that Muller describes as “input.” They’re critical to motivating change and ultimate performance. They’re the kind of immediate feedback that is important for peak performance.

The outcomes measures are lagging measures. That is, after the leading indicator changes, at some point in the future the lagging indicator should change. In the CRM example, the assumption is that if the number of active customers, sales visits, or proposals increase, then the overall sales should increase – and presumably so should the organization’s profitability.

If we were to focus on the number of sales proposals and the overall sales, we’d eliminate the kind of gaming that might happen if the salesperson felt like they were being evaluated on proposals. If that were the case, they might submit more proposals – but for deals that they have no chance of winning, because the metric is simply the number of proposals. They might also prioritize proposals for smaller deals – because they’re easier to write – so that their overall numbers are larger. However, when these numbers are coupled with the sales output numbers, it can become clear that there are decisions making the leading indicator move, but they’re not changing the thing that’s important to the organization.

Extending this scenario for a moment, salespeople may also target the overall sales number but will do so by lowering the profitability of the deals. They may realize that if they price deals 10% lower, they’ll get more sales – however, that impacts the overall profitability of jobs. As a result, it may be necessary to include profitability of deals in the mixture of metrics for the sales team to prevent them gaming the numbers around sales volume.

What You See Is All There Is (WYSIATI)

Daniel Kahneman describes the bias that you don’t consider information that isn’t present as “What You See Is All There Is” (WYSIATI). While it doesn’t roll off the tongue, it does explain one of the gaming behaviors Muller warns us about – that is, people focus only on those things for which there are metrics. Anything that isn’t being measured but still needs to be done is typically not done, to the detriment of the organization.

One answer to address this is simply to measure everything important for everyone in the organization. However, it’s relatively easy to see that this isn’t a viable option. A better answer may be to first create metrics that aggregate many metrics into one that gives a more complete picture and second to decrease the importance of a metric.

Metrics for Collaboration

When it comes to finding ways to make teams more effective, there are few places where the stakes are higher than in the intelligence community. That’s where Richard Hackman shows, in Collaborative Intelligence, that there are three layers of metrics that are effective for measuring collaboration. The first is the productive output of the group that is collaborating. This is the obvious metric that everyone uses – and it’s also lagging.

The second metric is an indicator that covers the social processes the group uses to interact with one another. It’s a leading metric. The third metric is the learning and growth of the group, which is a far-leading metric. One might, rightfully, challenge the use of the word “metric” with the criteria that Hackman recommends. They’re too generic to even be called criteria directly. They’re guidelines for the kinds of behavior that you want to see.

However, that’s the point for finding metrics that are aggregated. The explanations provided here – and in Hackman’s work – are enough for people to understand what the goal is. If they’re converted into a score, and people can easily find out what makes the score, they’ll understand the intent and what they can do to reach the intent. That will get them where they want to go.

Replacing Human Judgement

If you want an example of how metrics can become too big and too important – thus causing people to distort them – it’s harder to think of a better example than the right-turned-wrong of using statistics to form baseball teams. The book Moneyball exposed the world to how the Oakland Athletics baseball team used statistics – alone – to pick players with great success. Of course, now everyone is doing it – and it no longer works as well.

Any time we elevate metrics to the exclusion of human judgement, we’ve got a problem. It’s not that we don’t need to use metrics to inform our decisions or even that we don’t use them enough in many areas of our lives. It is, however, incumbent upon us to pay attention to how they’re being used. In medicine, we may not be leveraging them enough. (See Mistreated.) In education, the insistence on metrics has diverted classroom attention from analytical thinking to teaching for the test. (See The Years That Matter Most.) In finance, the use of metrics and models contributed to the financial meltdown of 2008, when people used metrics and instruments (that they didn’t fully understand) as a replacement for judgement and common sense.

To minimize the disruptive forces that cause the metrics to become distorted, we need to minimize the value we place on them. If someone’s job depends on a 25 and not a 24 on some metric, there’s too much weight, and they’ll do whatever is necessary to make the number. This is what happened with Wells Fargo when some 5,300 people created false accounts for customers to boost their numbers.

Pay for Performance

The idea that you need to decrease – not increase – dependence on metrics represents a challenge for pay-for-performance-based compensation. We’ve seen the huge distortions that are created when CEOs and executives’ goals are out of alignment with the well-being of the organization, but the problem occurs at all levels. In fact, paying for performance of individuals has been largely debunked as a myth of better performance. The only exceptions to this rule are the kinds of mechanical, unfulfilling work that, for the most part, we don’t see in the kinds of creative class professionals most of us work with. (See The Rise of the Creative Class for more on the creative class.)

For the most part, when we use metrics to pay for performance, we reduce intrinsic motivation and, ultimately, the overall performance.

The Tyranny of Metrics is, then, not that we can’t create the metrics but that we need to find the right metrics, at the right levels, to drive the right long-term performance, and that is difficult – but not impossible – to do.