Bill Evers has an excellent post over on his Ed Policy blog about how unreliable observational studies can be and how important it is to test claims with random-assignment research designs.
Observational studies (sometimes called epidemiological or quasi-experimental studies) do not randomly assign subjects to treatment or control conditions or use a technique that approximates random-assignment (like regression discontinuity). Instead they simply compare people who have self-selected or otherwise been assigned to receive a treatment to people who haven’t received that treatment, controlling statistically for observed differences between the two groups. The problem is that unobserved factors may really be causing any differences between the two groups, not the treatment. This is especially a problem when these unobserved factors are strongly related to whatever led to some people getting the treatment and others not.
The solution to this problem is random assignment. If subjects are assigned by lottery to receive a treatment or not, then the only difference between the two groups, on average, is whether they received the treatment. The two groups should otherwise be identical because only chance distinguishes them. Any differences between the two groups over time can be attributed to the treatment with high confidence.
If you don’t believe that research design makes a big difference, consider this table that Bill Evers provides on how much results change in the field of nutrition when random assignment (or clinical) studies are done to check on claims made by observational studies:
If we want to avoid the educational equivalent of quack medicine, we really need more random-assignment studies and we need to give the random-assignment studies we already have significantly greater weight when forming policy conclusions.
As I’ve written before, we have 10 random-assignment studies on the effects of vouchers on students who participate in those programs. Six of those ten studies show significant academic benefits for the average student receiving a vouchers and three studies show significant academic benefits for at least one major sub-group of students. One study finds no significant effects.
I believe that there are more random-assignment studies on vouchers than on any other educational policy and there are certainly more studies with positive results. The depth of positive, rigorous studies on voucher participant effects is worth keeping in mind each time some new observational or (even descriptive) study comes out on school choice, including the most recent report from Florida. Our opinion shouldn’t be based entirely on the latest study, especially if it lacks the rigorous design of several earlier studies.
While I’m a big fan of random assignment studies, especially in education, the table from Bill Evers’ blog is technically incorrect. A clinical trial, or any hypothesis test for that matter, never proves a claim to be “false.” All hypothesis tests can do is confirm a hypothesis or not confirm it, thereby resulting in a judgment of uncertainty. You can’t prove a negative. That’s not how science works. Instead of “false,” the Clinical Trial column in the Evers table should say “unconfirmed” for most entries. That still would be an indictment of too heavy a reliance on observational studies, but it would be a scientifically defensible one.
What the table is showing is that the results of the epidemiological studies were later found to be false when compared with the results of random-assignment research. So each time the table says “False,” this indicates that the epidemiological studies found one thing, but the random-assignment studies found the opposite.
If you really think that nothing can be proven false, how do you know anything is true?
If you want to get really technical – as it seems you do – then at the very least, instead of “unconfirmed” you should at least say something like “refuted” or “disconfirmed” – since “unconfirmed” implies that the random-assignment study was inconclusive, when in fact it was not.
Quantitative random assignment studies use hypothesis tests to generate conclusions. The logic of the hypothesis test is that one tests to determine if the null hypothesis of “no effect” can be conclusively rejected and, therefore, the alternative hypothesis of “an effect” can be affirmed. It is a serious scientific and logical fallacy to treat the failure to reject the null hypothesis as proving no effect. It doesn’t prove anything except that the study didn’t confirm the alternative hypothesis of an effect. The next study might, however, and your study might not have simply because it had too few observations or some other limitation.
Science never proves that a relationship doesn’t exist, only that a hypothesized one likely does or we can’t be sure. Just because you don’t see something doesn’t mean it’s not there — only that you don’t see it. By your logic, not seeing something would be definitive proof that it is not there.