Correlation and Causation: a simple guide
Posted on 22nd February 2017 by Ludwig Ruf
Consciously or subconsciously, we’re striving to find explanations in our surroundings about why things happen the way they happen. Let’s say we want to know why I suffered from a headache yesterday, or why some genes facilitate the mutation of human cells into cancer cells. Finding the real cause that triggers an outcome is important for three main reasons. It enables us to 1) explain the current situation, 2) predict future outcomes, and 3) to create interventions targeting the cause to change the outcome.
Now obviously the difficult task is to find the cause. Difficulty in establishing cause arises because behaviour and physiological processes are often the result of complex interactions between a multitude of factors. However, when things become complex, we try to break them down into the smallest units, investigate the relationships between them and put everything again together in order to draw general conclusions.
In research, this is typically done by correlating the variables of interest with each other. That is, by looking to see whether, as one variable increases, another variable also increases (a positive correlation) or decreases (a negative correlation).
Example 1: Chocolate consumption & Nobel Prize winners
Say, for example, a study found that chocolate consumption per capita is positively correlated with the number of Nobel Prize winners per 10 million residents, with the higher the chocolate consumption, the more Nobel Prize winners (Messerli, 2012).
Beware: the interpretation of the nature of this correlation is not straightforward. The study does not provide clear evidence about the direction of the effect. So it’s impossible to make a causal interpretation such as ‘eating more chocolate causes more Nobel Prizes’ or that ‘winning more Nobel Prizes makes you eat more chocolate’. In other words, we can say nothing about whether eating more chocolate will increase the likelihood of winning a Nobel Prize or vice versa. It is worth noting in this case that correlations simply show a pattern, without proving the nature of this pattern. It could be that this correlation simply occurs by chance. This is known as a spurious correlation (i.e. where 2 or more events are not causally related but may appear to be, either by coincidence or because they are caused by some unknown factor). Click here to discover more spurious correlations.
Example 2: Antibiotic exposure during first year of life & weight gain in early childhood
Let’s examine a second example: the potential association between antibiotic exposure within the first year of life and weight gain during early childhood. Research indicates that receiving more antibiotic orders increases the risk of being overweight at later ages during childhood (e.g. Bailey et al., 2014).
It might seem logical to conclude that consuming antibiotics in the first year of life causes excessive weight gain during early childhood. However, again, this type of research only shows a correlation. It does not examine the cause for these children becoming overweight compared to those children receiving less or no exposure to antibiotics. The follow-up question has to be: What is the exact underlying physiological mechanism behind this connection? While this research is helpful in first place, we should only take it as a starting point to discover the true mechanisms (if there are any). Without doing that, our interventions will be less effective because we are not targeting the actual cause.
Example 3. Increased BMI and increased risk of cancers
Lastly, let’s consider a third example. Increased BMI seems to be associated with an increased risk of several cancers in adults (Renehan et al., 2008).
Again, we might be misled by this. It would be erroneous to conclude that simply being overweight causes cancers. Instead, we need to consider other potential variables that might explain the relationship between increased BMI and increased risk of cancers. For instance, it can be argued that people with lower socioeconomic status are less educated about potential risk factors, can’t afford good healthcare service (e.g. preventive measures to reduce the risk of cancers) or simply have a lifestyle facilitating the development of certain diseases (e.g. less physical activity, diet and so so). In fact, socioeconomic status seems to be associated with BMI in British women aged between 37 and 73 years (Tyrrell et al., 2016).
These 3 examples illustrate some common pitfalls one can make when drawing conclusions from correlation studies. Although being aware of these pitfalls, it can be difficult to avoid them.
Nevertheless, I would recommend asking yourself the following questions while dealing with correlations:
- Is there scientific evidence, or even plausible logic, regarding the direction of the effect? (see the chocolate example).
- Are there intermediate variables that can explain the correlation? e.g. a biological mechanism that could explain the relationship (see the antibiotic example).
- Are there unmeasured variables that could explain the correlation? e.g. a third factor which could explain the relationship (see the cancer example).
To conclude, observing correlations between variables can be relatively straightforward, but establishing that one thing causes another is difficult. When reading articles or scientific papers, make sure to be critical. Question whether the claimed correlation between two variables can be treated as having a causal relationship.
So what is needed for the future? I think we need to develop the big picture of the interconnected relationships, rather than finding isolated associations between individual variables.
But before we have better answers regarding the complex interaction of correlations, it might be a good excuse to go for that bite of chocolate. You never know.
References
Bailey, L.C., Forrest, C.B., Zhang, P., Richards, T.M., Livshits, A., DeRusso, P.A., 2014. Association of antibiotics in infancy with early childhood obesity. JAMA Pediatr. 168, 1063-1069. doi: 10.1001/jamapediatrics.2014.1539
Messerli, F.H., 2012. Chocolate Consumption, Cognitive Function, and Nobel Laureates. N. Engl. J. Med. 367, 1562-1564. doi:10.1056/NEJMon1211064
Renehan AG, Tyson M, Egger M, Heller RF, Zwahlen M.Body-mass index and incidence of cancer: a systematic review and meta-analysis of prospective observational studies. Lancet. 2008 Feb 16;371(9612):569-78. doi: 10.1016/S0140-6736(08)60269-X.
Tyrrell, J., Jones, S.E., Beaumont, R., Astley, C.M., Lovell, R., Yaghootkar, H., Tuke, M., Ruth, K.S. Freathy, R.M., Hirschhorn, J.N., Wood, A.R., Murray, A., Weedon, M.N., Frayling, T.M., 2016. Height, body mass index, and socioeconomic status: mendelian randomisation study in UK Biobank. Br. Med. J. 352, i582. doi:10.1136/bmj.i582
No Comments on Correlation and Causation: a simple guide
the blog is really great for medical students…thank you for all the entertainment being provided
15th April 2019 at 6:32 amThank you for reminding that “correlation” does not automatically mean “causation”
13th March 2017 at 8:26 amAnother word/verb is often used in similar situations: to correspond.
How to position this one?
Thank you for the post ! I would have loved some discussion about Bradford Hill’s criteria for causality (and their criticisms/recent point of views, see articles from Glasziou and Ioannidis) to further guide readers, perhaps to an even more nuanced view.
The problem with logic is that you can explain almost anything and everything with it using a couples assumption-jumps (eg. chocolate has active ingredients, those can get in the blood, some can cross the brain-barrier, they could have effect on the brain and thus on intellect).
The issue with biological mechanism is that sometimes we simply don’t know how things work (as when antisepsis was discovered… and distrusted), or can’t yet explain them. Besides biological mechanisms aren’t very good to predict real effects in humans.
I hope you have fun doing your MSc !
22nd February 2017 at 7:29 pmHi Martin, thank you for your feedback. The article was considered to be a short introduction rather than an in depth analysis. Having said this, I was thinking about writing a more complex follow up on how to detect causes systematically (e.g. John Stuart Mill’s, Bradford Hill’s criteria. I would be more than happy to invite you writing a blog or sharing your thoughts about this topic since I think that you are more knowledgeable in this area than me.
24th February 2017 at 10:06 am