Counting things is generally much easier than defining sharply the boundary between what is to be counted and what is not. Focus often lands on that problem late, when a decision is to be made based on (or at least informed by) the metric and a shift in the definition could change the outcome.
While more attention early is definitely better, no amount of sharpening is ever going to anticipate every variation that will arise. That's why it is so important to use metrics in the context of the questions they are helping answer and the goals being sought.
http://www.nytimes.com/2015/12/04/opinion/how-many-mass-shootings-are-there-really.html?_r=0
"Salesforce, you see, refuses to release code unless there’s 75% test coverage. A contract developer programming on a deadline looked at that requirement and said ..."
"In our analytics-obsessed world, it’s tempting to first ask how to measure whether something is a view, but if we take a step back and just ask what a “view” is, the answer becomes clearer. What is a view? It’s when someone watches the video. And Facebook counts views significantly before people could be said to be watching the video."
This is a very common situation: something that looks like a simple binary turns out to depend on setting a threshold on some continuous variable. At extreme values the choice is obvious, but the hard work lies in the ambiguous zone in the middle. Active users, time spent, retained users, ...
https://medium.com/@hankgreen/theft-lies-and-facebook-video-656b0ffed369
"Whether it’s unpaid time waiting around at the beginning or end of a shift, spending time on tasks that are unavoidable but don’t officially count, or being forced to absorb the costs of uncertainties like weather delays and sub-par sales, workers are paying the price for new technologies of measurement in the workplace."
http://www.psmag.com/business-economics/the-future-of-work-what-isnt-counted-counts
"When the statistics were publicized, some talented surgeons with higher-than-expected mortality statistics lost their operating privileges, while others, whose risk aversion had earned them lower-than-predicted rates, used the report cards to promote their services in advertisements."
Gathering and analyzing the statistics is nonetheless a good idea. Refining the comparison cohorts would be an improvement, but the first thing I'd do would be to reduce the visibility. If the only person who saw a particular doctor's numbers was that doctor, he or she could decide whether variance was due to exogenous factors or a signal to take steps to improve. The direct link to an incentive (operating privileges) is, as usual, a driver of dysfunction.
What would you do?
http://www.nytimes.com/2015/07/22/opinion/giving-doctors-grades.html
(Shout-out to Garth Shoemaker for bringing this article to The Mole's attention.)