Goodhart’s Law and Data-Driven Decision Making

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” [Goodhart, Charles (1981). “Problems of Monetary Management: The U.K. Experience”. Anthony S. Courakis (ed.), Inflation, Depression, and Economic Policy in the West (Rowman & Littlefield): 116.]

Educators are relying more and more on data to inform their work.  Individual teachers are using hard data to make decisions about instructional practice (Formative Assessment, anyone?) and whole states are making policy decisions based on “what the data says” (Third Grade Reading Guarantee, anyone?).

The phenomenon is not unique to education.  But, we may not be as far down the road as other professions.  Professionals in other fields have experienced their own versions of “data-based decision making” and learned much from the experience.

In the field of economics, Charles Goodhart made the opening statement in this article.  It essentially says, “When a measure becomes a target, it ceases to be a good measure.

For instance, here is a classic economic example.  A certain factory produces nails.  The owner of the factory finds that productivity and profits are at their peak when the factory produces 100,000 nails per week.  This becomes a target, in the form of a quota.  The factory workers begin cranking out some very small, shoddy nails that can be produced quickly.  The workers easily make the 100,000 target, but it has ceased to be a good indicator, because the nails that are now being produced are useless.

Here is a hypothetical example as it might happen in a school.  A school records copious amounts of data on student behavior and discipline efforts over an entire school year.  At the end of the school year, staff members analyze the data and determine that in 30% of instances of student behavior that resulted in suspension or expulsion, the student was wearing a black shirt, and that figure is far higher than any other shirt color.  Based on this data, the school proposes a new rule for the next school year… students may no longer wear black shirts to school.

The indicator has become the target.  Rather than trying to reduce or eliminate the real target (behaviors that resulted in suspensions), the school has addressed the indicator (shirt color).  There is an underlying belief that the target and the indicator are so tightly linked that reducing one automatically results in the reduction of the other.

The Third Grade Reading Guarantee

More than half of US states now have some form of a “third grade reading guarantee” in place.  These have been sparked by volumes of data about overall student performance linked to the student’s proficiency in reading by the end of third grade.  One such example from the Annie E. Casey Foundation study “Early Warning Confirmed” draws such conclusions as “children who do not read proficiently by the end of third grade are four times more likely to leave school without a diploma than proficient readers.” [http://www.aecf.org/~/media/Pubs/Topics/Education/Other/EarlyWarningConfirmed/EarlyWarningConfirmed.pdf, page 4.]

This is certainly actionable data.  But, if we place our emphasis there, will we accomplish the real target?  Or will we just destroy the usefulness of third grade reading ability as an indicator of likelihood to graduate?  In effect, will our approach just change the color of the students’ shirts?

Correlation vs. Causation

xkcd comic about correlation and causation.
From xkcd.

If education and economics aren’t enough voices, let’s let the world of statistical science weigh in.  “Correlation” is when two sets of statistical data are closely related, so that when one changes, so does the other. “Causation” is when the existence of one circumstance causes another to happen (e.g., when I push the power button on my laptop, that causes it to turn on).  If two sets of data are correlated, we sometimes leap to the conclusion that one of them is causing the other.  We look for an explanation of the phenomenon.

Another example looks at the number of highway fatalities in the US, and the number of metric tons of lemons imported from Mexico to the US.  The accompanying graph clearly demonstrates that as the number of metric tons of lemons imported from Mexico increased (from 1996 to 2000), the number of US highway fatalities reduced at an incredibly similar rate.  We quickly jump from looking at the data to analyzing it by asking ourselves, “Why would increasing lemon imports reduce highway fatalities?”  And many of us also quickly come to the conclusion that no such link exists, and the extremely close correlation is nothing but coincidence.  A data-based decision in this instance would say that if you want to reduce highway fatalities even more, then lemon imports should be increased.  And nothing in our minds tells us that makes any sense.

Graph of tons of imported lemons versus highway fatalities.
Do imported lemons reduce highway fatalities?

Do What’s Right For Kids

None of these considerations should ever trump the foundational concept of “doing what is right for kids”. Even if locking my daughter in a room by herself and giving her electric shocks while reading caused her to retain more of what she reads and score higher on standardized reading assessments, I would still fight against them because they are wrong.  Whether my kids are able to read at the third-grade level by the end of third grade is one indicator of their likelihood to graduate.  However, whether they have a passion for learning is an even stronger indicator of whether my kids will succeed at school, and in life.  Any approaches that hinder or squelch that passion are wrong, regardless of any indicators that they support.

The “Unmeasurable” Cause

I am quite certain that the conclusion from the Annie E. Casey Foundation about reading proficiency by the end of third grade is correct.  But, I also believe that we make the mistake of taking correlation for causation, and fall deep into the pit of Goodhart’s Law, when we make third grade reading proficiency (or any other measurable indicator) our explicit target.  It may well be that the most affluent neighborhoods have a high percentage of yellow houses, but we do nothing to affect the socio-economic status of a neighborhood by requiring them to paint all their houses yellow, even if we provided all the materials and labor to do so.

Reading at third grade level by the end of third grade, graduating high school, and a host of other academic achievements are all attributable to a passion for learning.  That passion is largely not measurable, and thus it confounds the success formulas of those who wish to create whole cultures of data-based decision making, where experience, intuition, and passion are ignored when they contradict the conclusion pointed to by the almighty data.

This is not to say that data should be ignored.  Data can give us great insights into the effectiveness of instructional strategies and projecting student achievement. Data can help us see holes where they weren’t clear before, or make us face up to deficiencies that we did not (or did not want to) acknowledge by our own subjective perceptions.  However, we cannot allow data to make decisions for us, especially when experienced educators know the conclusions to be (at best) unrelated, or (at worst) detrimental to students.

The next time you are in a district/building/teacher leadership meeting, and the data being presented says you should take a certain course of action, and you know that action is no good, remember Goodhart’s Law, and ask yourself if you’re really just telling the students to change their shirts.

 

Leave a comment