A network for students interested in evidence-based health care

Applied Kinesiology: a critical appraisal

Posted on 5th July 2019 by

Evidence Reviews

This blog is a critical appraisal of the following pilot study: Intraexaminer comparison of applied kinesiology manual muscle testing of varying durations: a pilot study (1).


Applied kinesiologists use manual muscle tests according to similar methods described by Kendall to diagnose dysfunctions and guide clinical decisions.

The muscle can be rated as “facilitated” or “strong” if the subject can maintain the position against gradually increasing pressure; or it can be rated as “inhibited” or “weak” if the muscle weakens during the procedure.

One of the factors that could influence the results of the manual muscle tests used in applied kinesiology is the duration of the pressure imposed to the muscle within the test. Differences in the duration of the pressure could result in different scores between different raters or within the same rater in different moments.

Aim of the study

The aim of the current study was to compare the results (strong/weak) between short (1 second) and long (3 seconds) manual muscle tests of the same subject.

Null hypothesis: The duration of the test does not influence the outcome.

Alternative hypothesis:  The 2 conditions are at least partially independent of each other and so would demostrate a low Cohen’s Kappa value.

Secondarily, the study compared peak force of the Maximum Voluntary Isometric Contractions (MVIC) tests and peak force of manual muscle tests between strong and weak tests to further define the objective differences between the states applied kinesiologists refer to as “strong” and “weak”.

Methodology of the study

An applied kinesiologist, with more than 30 years in the practice and teaching of this subject, examined 44 chiropractic students (23 men; 21 women). The manual muscle test was performed on the middle deltoid muscle.

All subjects were tested in 3 ways, including MVIC against a strap, manual muscle test for 1 second, and manual muscle test for 3 seconds.

All subjects performed the MVIC first and then were manually tested.

The order of the two (1 second; 3 seconds) manual muscle tests were assigned randomly by toss of a dice. Subject’s numbers where the toss was even had the short test first and subjects where the toss was odd had the long test first.

During the manual muscle tests the following parameters were recorded:

  • Estimate of MVIC (pound)
  • Result of the manual muscle test (strong/weak)
  • Duration of the manual muscle test
  • Peak force of manual test (pound)
  • Peak force as a percentage of MVIC

Results of the study

The short tests averaged 1.09 seconds, while the long tests averaged 2.34 seconds (mean difference = -1.245, p < 0.0001) . Long tests averaged significantly higher peak force than short tests in absolute terms (mean difference = -0.804, p = 0.0002)  and as a proportion of MVIC (mean difference = -0.32, p = 0.0001).

In the short tests there were 42 strong and 2 weak results. In the long tests there were 39 strong and 5 weak results.  The Cohen’s Kappa coefficient for agreement between short and long duration tests was 0.54, indicating only fair agreement between the 2 conditions. The null hypothesis was good agreement with a Kappa value of 0.61 or greater. The null hypothesis was rejected; the duration of the manual muscle tests does appear to matter.

Contingency Table 2×2

Long Duration Test



Short Duration Test


39 3



0 2


39 5



Peak force as a percentage of MVIC was greater in the muscles rated as weak compared to the muscles rated as strong in the short (mean difference = -0.103, p = 0.0114) and long duration tests (mean difference = 0.116, p = 0.0005).

Peak force as pounds was not different in the muscles rated as weak compared to the muscles rated as strong in the short duration tests (mean difference = -0.755, p = 0.4789). In the long duration tests peak force was higher in the muscles rated as weak compared to the muscles rated as strong (mean difference = -2.131, p = 0.0054).

Critical appraisal

There are some statistical considerations that should be taken into account in this study.

In the first place, the fact that there were 42 (1 second) and 39 (3 second) positive results of a total of 44 trials indicate the presence of a high Prevalence Index (2).

The Kappa coefficient is influenced by the prevalence of the attribute. When there is a high difference in the agreement of positive/negative (strong/weak) results there is a prevalence effect; this can be expressed by the Prevalence Index (2):

Prevalence Index = |positive agreement – negative agreement| / overall number of trials.

In this study, the Prevalence Index is 0.84.

When there is a high Prevalence Index, chance agreement is also high and Kappa is reduced accordingly (2). In this study, the absolute agreement between situations (1 second, 3 seconds) is 93.18% and the chance agreement is 85.12%.

There is a way to control for this effect – the Prevalence-adjusted bias-adjusted-kappa (PABAK). The PABAK gives us a value of the Kappa coefficient with minimal influence of the prevalence and bias. The PABAK is calculated changing the values of the cells in the contingency table minimizing the difference between positive/negative agreements and disagreements without changing the total number of agreements/disagreements (2).

The PABAK for this study is calculated based on the following contingency table:

Contingency Table 2×2 (adjusted for prevalence and bias)

Long Duration Test



Short Duration Test

Strong 21 2 23


1 20


22 22


The PABAK value is 0.8636. This value is much higher than the normal Kappa coefficient reported in the study.

Secondly, the results of the objective strength measurements showed some curious results. The tests rated as “weak” by the applied kinesiologist showed more strength than those rated as “strong”. Also, the 1 second tests showed more objective strength than the 3 second tests, despite the contrary results of the subjective scores.


Based on the values of absolute agreement and PABAK, as well as the incongruences of the strength measurement results, we cannot ensure that the Applied Kinesiology’s Manual Muscle Test results are different depending on the test duration, or that the applied kinesiologist’s subjective scores reflect real changes in strength within a muscle.

References (pdf)


Ruben Fernandez Matias

I'm a physiotherapist from Spain. I'm currently studying a master's degree in manual therapy and therapeutic exercise. Twitter: @RubenFMat. View more posts from Ruben

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.