# Statistical significance vs. clinical significance

Posted on 23rd March 2017 by Cindy Denisse Leyva De Los Rios

#### What if I told you that I conducted a study which shows that a single pill can significantly reduce tiredness without any adverse effects?

Would you try it? Or recommend it? Or would you want more information to decide? Maybe you’re a little skeptical. I will show you the results, so don’t make a decision just yet.

#### From now on let’s imagine this scenario…

Before I tell you the results of my study, you need to know how it was carried out.

- First I took a group of 2,000 adults between 20-30 years old, all of whom suffer from constant tiredness. Then the participants were randomly divided into 2 groups, with 1000 participants in each.
- One group of participants (the intervention group) were given the new drug:
*energylina.*The other groups of participants (the control group) were given a dummy (placebo) pill. - Nobody knew – neither the participants nor the researchers involved in the experiment – whether they were taking ‘
*energylina’*or the placebo. The participants took the pills for 3 weeks, 2 per day. - We used a scale to measure participants’ levels of tiredness before and after the trial. This rated fatigue on a scale of 1 to 20; with 1 meaning the participant felt entirely well-rested and 20 meaning the participant felt entirely fatigued.
- The results revealed that:Â 90% of the participants in the energylina group improved by 2 points on the scale. 80% of participants in the placebo group improved by 1 point on the scale.
- This difference between the groups was statistically significant (p < 0.05) meaning that, at the end of the 3 weeks, participants in the intervention group were significantly less tired than those in the control group.

#### So does that mean the treatment is effective? Should you take “energylina”? Should every doctor prescribe it?

Not necessarily! Let’s make a couple of things clear first. **At this point, you might be wondering why the title of this blog ‘statistical significance vs. clinical significance’.**

Well, I will explain it right now; the results I gave you are there to help you make a decision. You want to know whether energylina is effective enough to recommend to individuals who suffer fatigue. Did the results convince you?

Before you answer, first let me clarify something: **Clinical significance is the practical importance of the treatment effect, whether it has a real, palpable, noticeable effect on daily life.** For example, imagine a safe treatment that could reduce the number of hours you suffered with flu-like symptoms from 72 hours to 10 hours. Would you buy it? Yes, probably! When we catch a cold, we want to feel better, as quickly as possible. So, in simple terms, if a treatment makes a positive and noticeable improvement toÂ a patient, we can call thisÂ ‘clinically significant’ (or clinically important).

In contrast, statistical significance is ruled by the p-value (and confidence intervals). When we find a difference where p <0.05, we call this ‘statistically significant’. Just like our results from the above hypothetical trial. If a difference is statistically significant, it simply means it was unlikely to have occurred by chance. It doesn’t necessarily tell us about the *importance* of this difference or how meaningful it is for patients.

### So it’s important to consider that trial results could be…

**Statistically significant AND clinically important.**This is where there is an important, meaningful difference between the groups and the statistics also support this. (The flip side of this is where a difference is neither clinically nor statistically significant).**Not statistically significant BUT clinically important**. This is most likely to occur if your study is underpowered and you do not have a large enough sample size to detect a difference between groups. In this case you might fail to detect an important difference between groups.**Statistically significant BUT NOT clinically important**. This is more likely to happen the larger sample size you have. If you have enough participants, even the smallest, trivial differences between groups can become statistically significant. It’s important to remember that, just because a treatment is statistically significantly better than an alternative treatment,*does not necessarily meanÂ that these differences are clinically important or meaningful to patients.*

### Going back to our hypothetical study, what have we got: statistical significance? clinical significance, or both?

Remember we had 2 groups, with 1000 participants in each. In the intervention group, 90% of the participants improved by 2 points on the tiredness scale whereas 80% of the participants in the placebo group improved by 1 point on the tiredness scale.

Is the difference between both groups remarkable? Would you buy my product to have a slightly higher probability of achieving 1 point less on a tiredness scale, compared with taking nothing? Perhaps not. You might only be willing to take this new pill if it were to lead to a bigger, more noticeable benefit for you. For such a small improvement, it might not be worth the cost of the pill. So although the results may be statistically significant, they may not be clinically important.

### To avoid falling in the trap of thinking that because a result is statistically significant it must also be clinically important, you can look out for a few things…

- Look to see if the authors have specifically mentioned whether the differences they have observed are clinically important or not.
- Take into account sample size: be particularly aware that with very large sample sizes even small, unimportant differences may become statistically significant.
- Take into account effect size. In general, the larger the effect size you have, the more likely it is that difference will be meaningful to patients.

So to conclude, just because a treatment has been shown to lead to *statistically significant* improvements in symptoms does not necessarily mean that these improvements will be *clinically significant* (i.e. meaningful or relevant to patients). That’s for patients and clinicians to decide.

## No Comments on Statistical significance vs. clinical significance

Cecil Cristian Liebsch MartinezDear Cindy & all readers:

I appreciate the comments made by all of you. However, I consider it appropriate to contribute with my belief, which I would like to explain “in extenso”, but it would not be appropriate, for obvious reasons.

Succinctly, as we know:

1 – The p value is the probability of observing the difference found between the exposed and non-exposed group, or a more extreme one, if the null hypothesis (H₀) is TRUE. This, in terms of frequentist distribution.

2 – If the null hypothesis IS NOT REJECTED, because there is not enough evidence not to do so (p>=0.05; 95% confidence), it IS NOT CONFIRMED, even though it is FALSE, in this case and by chance : Type II error.

3 – The p value is the probability of observing a difference if it does not really exist, whose statistical significance is merely probabilistic and does not show the MAGNITUDE or the EFFECT between the compared groups, which means that for this, the p value be completely useless. Therefore, care must be taken when using “Significance” with the p-value.

Much more could be said about it, but I think it is better to cite these links, to better illustrate what is stated here.

Kind regards!

Cecil Christian

P.S. : See these links.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689604/#:~:text=The%20essential%20differences%20between%20p,the%20level%20of%20data%20measurement.

In Spanish or translate, please

29th April 2023 at 11:46 amhttps://scielosp.org/article/rpsp/2004.v15n5/293-296/es/

mandeepimprovements in symptoms does not necessarily mean that these improvements will be clinically significant

26th March 2023 at 5:58 amDavid ColquhounPerhaps it might be useful also to point out the fallibility of “statistical significance”. For many decades it’s been accepted by statisticians that, when testing a null hypothesis, if you observe a p value just below 0.05 then the risk that your result is a false positive is not 5% but it’s at least 20-30%. Although this is well-known to statisticians, it seems tp be almost unknown to experimenters.

29th June 2022 at 2:26 pmSee, for example, https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529622

George SORBORthank you for this very helpful piece of information. it really helped me.

10th March 2021 at 12:29 pmKimInteresting

25th May 2020 at 10:05 pmAdrianaHello CINDY

Excellent way to explain a very confusing concept. This ability of making challenging subjects , easy to understand will certainly make you a much in demand and popular professor in the future . Stay on your journey and you will be an amazing physician. Blessings to you and congratulations !!!

Bendiciones & muchos exitos

15th September 2019 at 1:50 amOby chilakaVery impressive and straight to the point. Thanks a lot

15th August 2019 at 4:16 amAlpana KulkarniThis was an amazing explanation and I understand the terms in a much better way! Thank you.

23rd January 2019 at 11:52 amMario TristanVery good however it is important to clarify how the effect size is observed or estimated .

14th October 2018 at 7:11 pmGillianReally well explained to an old student and patient …. Thank you.

10th July 2018 at 10:16 amSueMany thanks Cindy, really well explained.

8th July 2018 at 8:29 pm