Statistical vs Clinical Significance: Evaluating Bias in Research
A large dataset can easily demonstrate statistical significance for all manner of relationships, but whether those relationships are clinically significant is a different question altogether.
After a long evening lubricated by perhaps one too many martinis, a young man stumbles out of his favorite watering hole. Despite the wishes of his buddies, state law, and his own better judgment, he is determined to drive himself home. But alas, fate intervenes: our protagonist cannot find his keys. A few minutes later, a police officer happens upon him, on his hands and knees, under a streetlight, at least 50 paces from where his car is parked. Perplexed, the officer taps the man on the shoulder.
“What are you doing, sir?”
“Trying to find my car keys.”
“Shouldn't you be looking closer to your car?”
“I'd love to. But the light is better over here.”
As you probably surmised from the officer's rather cavalier treatment of an inchoate felony, this is not a real story — it is an old parable told by economists to illustrate something called the “streetlight effect,” which describes our tendency to focus only on the data that are most readily available, even when they are not necessarily useful.1 Evidence of the streetlight effect can be found across several fields, from physics and astronomy, to economics, in which cases investigators draw suspect conclusions from analysis of irrelevant data. Medicine, unsurprisingly, has not been immune.
Cancer researchers in particular have struggled with this bias. It is not their fault; it is just the nature of the problem with which they are wrestling. Ideally, the study of a new drug or technique to fight cancer includes the assessment of whether the proposed intervention will lengthen average survival. This makes good sense; the first thing patients want to know after a cancer diagnosis is whether they are going to die. However, even though this may seem to be a simple concept, overall survival may be challenging to measure. Measuring overall survival demands large study populations and drawn out investigation times. In many cases, performing an overall survival study simply may not be financially or logistically plausible. In those situations, studying progression-free survival (PFS) may be more feasible. Intuitively, it seems PFS should be closely tied to overall survival and might be a whole lot easier to test for. PFS is squarely in the streetlight.
The problem is that there is a reasonable chance that overall survival is way over there, closer to the car. Even as published oncologic trials have begun to focus on PFS as a matter of course, we have yet to firmly establish that there is in fact a tight relationship between PFS and overall survival. Additionally, there are actually a number of reasons why, in any given case, these 2 measures may not be closely linked.2 It could be that the change in tumor size required for progression is so small that it does not affect survival time. Perhaps, because the process of dating the start of progression is notoriously imprecise, a detected change in PFS may not constitute a true finding. Perhaps there is some other complicated biologic explanation that we are not even considering. It is impossible to know.
The PFS/overall survival story is further muddied by the distinction between statistical significance and clinical significance. Statistical significance is what your freshman math teacher would go on and on about — essentially it refers to having a certain level of assurance that the relationship you are observing is not just the result of random chance. Clinical significance is a little different. It is concerned with whether or not one can be confident that the intervention in question improves a particular patient-centered outcome, like quality of life or age at death. A large dataset can easily demonstrate statistical significance for all manner of relationships, but whether those relationships are clinically significant is a different question altogether. The key point is that statistical significance is easy to measure — you just plug some numbers into a formula — whereas clinical significance is much trickier, and sometimes even requires us to brave the horror of looking beyond quantitative measures when evaluating patient outcomes. Care to guess on which brand of significance we tend to fixate?
Statistical vs clinical significance is a battle royale being constantly waged throughout all facets of medical research. Our ongoing efforts to determine the link between diet and cardiovascular disease is a striking example of this. For generations, the received wisdom on the most heart-healthy diet has been all over the map — meandering from low-carbohydrate to low-fat, to Atkins, to paleo and back again — until, in 2013, the New England Journal of Medicine published an article lauding the cardioprotective properties of the olive oil-heavy so-called Mediterranean diet.3 The paper trumpeted a 30% reduction in stroke and myocardial infarction in the Mediterranean group, with a P value well within the realm of statistical significance. The fine print, however, showed that the difference in the likelihood of an eventual cardiovascular event amounted to, at best, 1 percentage point. Does this reach the level of clinical significance? I have no idea, and I do not think anyone else does either.
The wars waged over dietary science in peer-reviewed journals also reveal how the streetlight effect can warp lay perceptions. Drivers of the cyclical attempts to pin America's obesity epidemic on sugar (or fat, or any number of culprits) inevitably find their way to the tried and true technique of using carefully selected end points to drive the conversation. At various junctures, well-publicized studies have described sugar as “[non-contributory] to poor health,” “empty calories,” “an important energy source for children,” and the cause of conditions as varied as gout, stroke, and dementia.4 That is quite the range, and probably more to the point, quite the range of interests being served. The unifying theme, of course, is that these studies each opted to focus on specific measures, without particular regard as to whether they were truly proxies for the outcomes that actually concern us. They sure are easy to see, though.
Our drunk friend stumbled around for a few more minutes until the police officer took pity and drove him home. He would be a bit embarrassed (and mildly hung over) the next day, and would eventually rue his upcoming court appearance, but all in all, the outcome was about as positive as could have been expected. There was also a happy ending: the next morning, he returned to his car and, aided by sobriety and sunlight, found his keys just a few feet from the vehicle. They were right where he thought they would be.
- Moritz M. Big data's 'streetlight effect': Where and how we look affects what we see. The Conversation. Available at: theconversation.com/big-datas-streetlight-effect-where-and-how-we-look-affects-what-we-see-58122. May 17, 2016. Accessed November 15, 2017.
- Booth CM, Eisenhauer EA. Progression-free survival: meaningful or simply measurable? J Clin Oncol. 2012;30(10):1030-1033.
- Estruch R, Ros E, Martínez-González MA. Mediterranean diet for primary prevention of cardiovascular disease. N Engl J Med. 2013;369(7):676-677.
- Groopman J. Is fat killing you, or is sugar? The New Yorker. Available at: www.newyorker.com/magazine/2017/04/03/is-fat-killing-you-or-is-sugar. April 3, 2017. Accessed November 15, 2017.