PowerPoint Presentation

Thomas Steckler

The views expressed in this presentation are solely those of the

individual authors, and do not necessarily reflect the views of their

employers.

Statistical power, importance of

effect sizes, and statistical

analysis

Information Meeting for Grant Applicants on the Call for

Proposals for Confirmatory Preclinical Studies and Systematic

Reviews - Quality in Health Research, Feb 19, 2019

2

Phillippe Vandenbroeck et al. (2006)

J Biopharm Statistics 16, 61-75

3

Underpowered Studies – A Common

Observation

Median Power of Studies Included in

Neuroscience Meta-Analyses

Included in the analysis were articles published in 2011 that described at least one meta-analysis of previously

published studies in neuroscience with a summary effect estimate (mean difference or odds/risk ratio) as well

as study level data on group sample size and, for odds/risk ratios, the number of events in the control group.

Studies with low statistical power

have:

• Reduced chance of detecting a true

effect

• Low likelihood that a statistically

significant result reflects a true

effect

• Overestimated effect sizes

• Low reproducibility

low

(57 %)

intermediate

(29 %)

high

(14 %)

Button et al., Nature Rev Neurosci, 2013

Power as function of animal

number

power

4

Another Consequences of Underpowered

Studies: Violation of the 3Rs

Adapted from M Macleod

5

What is Power?

 1 –

β

 Probability to correctly reject

H

0

 Probability to avoid a Type II error or a false negative

 Chance to correctly detect a difference (Sensitivity)

Test result

Pregnant

Not pregnant

Reality

P(false positive)

Type I error

α

P(correct negative)

(Specificity)

1 -

α

P(correct positive)

Power (Sensitivity)

1 -

β

P(false negative)

Type II error

β

6

Statistical Hypothesis Testing

H

0

: Null Hypothesis

H

1

: Alternative Hypothesis

α

: P(false positive)

β

: P(false negative)

1-β

: P(correct positive)

μ

: Population mean

σ

: Population variance

Δ

: Effect size

Normal Distribution of

H

0

and

H

1

7

Statistical Hypothesis Testing

 What is the probability of your data showing the observed difference

when in reality there is no difference (i.e.,

μ

1

–

μ

0

= 0)?

–

p

- values describe these probabilities

̶ A formal decision-making process that ensures false positives (or Type I error,

α

)

occur only at a predefined rate

̶ The smaller the

p

-value, the lower the probability that there is a false positive

̶ NOT the probability that an observed finding is true

̶ NOT implying a biologically meaningful effect

– Statistical significance if

p

< 0.05

̶ Means that if there is a false positive rate below

α

= 5%, we are willing to reject H

0

̶ Is an arbitrary convention, not a fundamental principle

8

What Makes a P-Value?



p

is a value computed by a statistical test (and may differ

according to the test used)

 There is a mathematic relationship in any data set between:

– Accepted significance level (

α

), and hence

p

-value

– Observed/desired effect size (

Δ

)

– Variability within tested sample (standard deviation;

σ

)

– Sample size (

n

)

Changing one factor will affect the others

 Type 2 error (

β

) will decrease as a function of sample size

 Power (1-

β

) will increase as a function of sample size

 Sampling mean variability will decrease as a function of sample size

9

Relevance of Sample Size for Power

± 1

σ

Δ

1-

β

α

μ

0

μ

1

H

0

H

1

± 1

σ

Type I error

Type II error

Power

n = 5

1-

β

n

= 10

10

Sample Size is Where You Randomize!

 Sample size = number of experimental units per group

 Experimental unit: entity subjected to an intervention

independent of all other units

 Observational unit: entity on which measures are taken

(primary) outcome measure

 Biological unit: entity about which inferences are made

11

Example Teratogenicity Study

 Regulatory requirement as part of drug development program

 Q: Does treatment X have embryotoxic or teratogenic effects?

 Study design:

Susan B. Laffan, Teratology Primer, 3rd Edition

12

Example Teratogenicity Study

 Biological unit:

– The offspring (do fetuses show abnormalities?)

 Observational unit:

– Observations made in the offspring during external, visceral and

skeletal examination

– E.g., fetuses with branched ribs

 Experimental unit:

– The litter, not the individual fetuses!

– (entity subjected to an intervention

– independent of all other units)

OECD and ICH do not allow offspring to

be used as independent samples!

Pseudo-replicates

13

Why is Power Important?

 There is no difference (true negative)

 The study failed to detect that difference (false negative)

A study may not show a difference between groups because:

https://www.youtube.com/watch?v=SRyP2BPGUgg

14

Issues with Low Power

Potentiation by Bad Practice

1000 hypotheses

10% true → n = 100

Publication

Bias

65 experiments published

69% false positives

Power 0.8:

125 experiments published

36% false positives

False positive rate

α

= 0.05

5% → n = 45

Power = true positive rate = 0.2

20% → n = 20

False negatives

80% → n = 80

Adapted from The Economist, 2013

15

Strategies Used Along the Discovery

Chain

Target

Identification

Target

Validation

Hit

Identification

Hit to

Lead

Optimization

NME

Declaration

EXPLORATORY /

SCREENING

HYPOTHESIS TESTING /

CONFIRMING

• Absence of a scientifically plausible hypothesis

• Methods can be adapted (to some degree)

•

p

-values cannot be interpreted at face value

• A scientifically plausible hypothesis exists

• Requires pre-specification of

H

0

,

experimental methods incl. sample size and

analytical methods

• Leads to a statement of significance

Elements can be combined

• Confirmatory for primary (and key secondary) endpoint

• Exploratory for other endpoints

16

Strategies Used Along the Discovery

Chain

Target

Identification

Target

Validation

Hit

Identification

Hit to

Lead

Optimization

NME

Declaration

Type I error

tolerance

Need for high

sensitivity

• Look for strong effects

• Importance to discover potential early on

• Discovery of false positives less serious

• Primary risk with the company

• Look for true effects

• Avoid exposure of patients to non-active therapy

• Only invest in development of efficacious therapy

• Risk of the company and of authorities

Power

low

high

Thomas Steckler

The views expressed in this presentation are solely those of the

individual authors, and do not necessarily reflect the views of their

employers.

Statistical power, importance of

effect sizes, and statistical

analysis

Information Meeting for Grant Applicants on the Call for

Proposals for Confirmatory Preclinical Studies and Systematic

Reviews - Quality in Health Research, Feb 19, 2019

18

Other Ways to Affect Power

Adapted from F Tekle

n

= 5

σ

= 1.7

α

= 0.05

Δ

= 0.1

1-

β

Chose a different significance level

(

α

)

α

= 0.1

1-

β

Change variability within a sample

(

σ

)

σ

= 3.4

1-

β

Alter the effect size (

Δ

)

Δ

= 0.05

1-

β

19

The Different Factors Interact

Smaller effect sizes require larger

sample sizes

Measures with large variability require

larger sample sizes

Adapted from F Tekle

Increasing Sample Size Can Make a

Result Significant

20

control

manipulation

0.0

0.5

1.0

1.5

Low variability

unit

x=0.970 x = 1.026

n

= 5

σ

= ±0.046 vs. ±0.045

p

= 0.411 in t-test

control

manipulation

0.0

0.5

1.0

1.5

Very low variability

unit

x=0.970 x = 1.026

n

= 5

σ

= ±0.007 vs. ±0.005

p

= 0.002 in t-test

*

control

manipulation

0.0

0.5

1.0

1.5

Low variability, but large n

unit

x=0.970 x = 1.026

n

= 25

σ

= ±0.046 vs. ±0.045

p

= 0.039 in t-test

*

But is it meaningful?

Adapted from M. Michel, 2017

Hypothetical Data Set

21

What is a Meaningful Effect?

 If sample size is too high and desired effect size is not defined, there is an

increased risk

– of making a false positive conclusion (Literature is full of it!)

– that statistically significant effects are declared that are of no practical relevance

– that resources are wasted

– of ethical issues (e.g., in case of animal studies)

 A desired effect

– should be relevant (scientifically, clinically, economically,…)

– should be determined upfront (which effect must be detectable to make the study

relevant?)

– should be documented (to avoid bias)

– is a choice by the researcher

– based on estimation or experience, not statistically determined

– but will affect statistics!

22

 Exploration of sample size sets of

n = 10, 20, 30, 50, 85, 170, 500,

and 1000, drawn from a

multivariate normal distribution

 1,000 studies simulated per n

 True effect size arbitrarily set to

Δ

= 0.3

 To obtain 80% power for

Δ

=0.3

requires 85 samples

Jim Grange (2017)

https://jimgrange.wordpress.com/2017/0

3/06/low-power-effect-sizes/

Low Sample Size Inflates Measured

Effect Sizes

→low power and high FDR

23

False Discovery Rate vs. Positive

Predictive Value

 Positive predictive Value (PPV)

– Probability that a real effect exists if a “significant” result has been obtained

 False Discovery Rate (FDR)

– Probability that a real effect does NOT exist if a “significant” result has been obtained

 PPV and FDR are flipsides of the same coin (FDR = 100 – PPV)

A research finding is less likely to be true in case of

 Small studies

 Small effect sizes

Unintended Flawed Procedures Impact

Data Quality

24

Thomas Steckler

The views expressed in this presentation are solely those of the

individual authors, and do not necessarily reflect the views of their

employers.

Statistical power, importance of

effect sizes, and statistical

analysis

Information Meeting for Grant Applicants on the Call for

Proposals for Confirmatory Preclinical Studies and Systematic

Reviews - Quality in Health Research, Feb 19, 2019

26

Sample Size Affects Sampling Distribution

Hence Statistical Approach

Central Limit Theorem

 For a random sampling with a large

n

, the sampling distribution is

approximately normal, regardless of the underlying distribution

 Note also tests for normality will struggle if the sample size is small

Simulated data set of a skewed distribution:

1,000 randomly drawn samples for sample sizes

n

=10 and

n

=50

As the sample size increases, and the number of samples taken remains constant, the distribution of the 1,000 sample

means becomes closer to the smooth line that represents the normal distribution

https://archive.cnx.org/contents/323cb760-5f99-4899-a96b-f3919f982a6a@10/using-the-central-limit-theorem#eip-id1169478792440

27

What is the Right Power?

 Cohen (1988) proposed as a convention that, when the investigator has no

other basis for setting the desired power value, the value .80 be used

 Rationale:

1. α

/

β

Interrelationship:

 if 1-

β

increasing→

β

decreasing →

α

increasing

 (effect size and sample size being unchanged)

2. Relative error seriousness: Scientists usually consider false positive claims

more serious than false negative claims

 if

α

= .05 and

β

= .20 → relative seriousness = .20/.05

1. (i.e., seriousness of Type I errors = 4 x seriousness of Type II errors)

3. Feasibility: Power >.90 would demand very large sample sizes

 if 1-

β

increasing and

α

constant → needs increasing effect size (may not be feasible)

or

increasing sample size

Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences, 2

nd

ed.,

Lawrence Erlbaum Associates

There may be reasons to divert from the convention

28

How to Get to the Right Sample Size?

 Based on the given variability and the expected difference between the

effect of manipulation X and the effect of the control groups a minimum

number of samples is required

 Variability:

– Should be provided in standard deviation units (

σ

)

– Can be estimated from pilot studies or data reported in the literature

– Can be calculated from SEM and

n

:

σ

= SEM ∗ √

n

 Effect size (Cohen’s

d

):

– Potential mean difference between any two groups in terms of standard deviation

units

– Signal (effect size of scientific interest)/noise (variability)

– Can be estimated from pilot studies or data reported in the literature

– Convention: small (

Δ

= 0.2), medium (

Δ

= 0.5), large (

Δ

= 0.8)*

– If in doubt, hypothesize a

Δ

= 0.5

*based on benchmarks by Cohen (1988)

may be larger in case of animal studies: small (d = 0.5), medium (d = 1.0), large (d = 1.5)

29

Power Depends on Experimental Design

and Statistical Analysis

Example: Two independent samples, comparing two means

 Analysis: unpaired T-test

– T-test can be one- or two-tailed (two-tailed preferred)

can be paired (dependent samples) or unpaired (independent samples)

 Assumption: comparison of two independent samples of equal

n

– Equal-sized samples are desirable, since it is demonstrable that with a given

number of cases available, equal division yields greater power than does unequal

division

Sample size calculation (one-tailed):

n

= (

Z

α

+

Z

1-

β

)

²

∗ 2

σ

²

/

Δ

²

with

Z

α

= constant according to accepted

α

level [100(

α

) percentile of the standard normal distribution],

depending on whether test is one- or two-tailed (in the latter case would be

Z

α/2

)

Z

1-

β

= constant according to power of the study [100(1-

β

) percentile of the standard normal distribution]

σ

= common population variance

Δ

= estimated effect size

30

Common population variance:

σ

= √(

σ

1

²

+

σ

2

²

) / 2

C

ohen’s

d

:

Δ

= (

μ

1

+

μ

2

) /

σ

Z

-values for T-test:

Z

α

& Z

α

/2

Z

1-

β

with

σ

1

= standard deviation control group

σ

2

= standard deviation experimental group

with

μ

1

= mean of control group

μ

2

= mean of experimental group

α

0.01 0.05 0.1

Z

α

2.326 1.645 1.282

Z

α

/2

2.576 1.960 1.645

1 - β

0.8 0.90 0.95

Z

1-

β

0.842 1.282 1.645

one-tailed

two-tailed

31

Finally: Software for Sample Size

Calculation

 G*Power

– http://www.gpower.hhu.de

– Stand-alone program

– Windows, Mac

– Free

 PASS

– https://www.ncss.com/software/pass

– Stand-alone program

– Windows

– Not freely available

 Various R packages

– Windows, Mac, Linux

– Generally free

 SAS, SPSS, Minitab, Microsoft Excel packages

 NC3Rs Experimental Design Assistant (EDA)

– https://eda.nc3rs.org.uk/experimental-design-group

– Free

If in doubt, ask your STATISTICIAN !

32

33

Know Your Research Question

Research Question:

Does Intervention X (I

X

) alter the Primary Outcome Measure Y (POM

Y

) ?

increase

decrease

increase or decrease

Null Hypothesis (H

0

):

 I

X

has NO effect on POM

Y

Alternative Hypothesis (H

1

):

 I

X

has an effect on POM

Y

directional (one-sided)

non-directional (two sided)

34

The Protocol

Research Protocol

Describes design logic and

forms the basis for reporting

 Research question

 Alternative hypothesis

 Experimental design rationale

 Measures to minimize bias

 Choice of clinical model

 Operationalization of responses

Execution Protocol

Describes practical actions

Guidelines for lab techs

 Intervention

 Materials

 Logistics

 Data collection & processing

 Personnel, responsibilities

Reflects the various layers in the design and execution of the experiment

Statistical Protocol

Describes design & analysis

methods

 Specific designs chosen

 Power calculations

 Description of the statistical

model and tests used

Adapted from L. Bijnens, 2017

Mean, S

D

and SEM for three samples increasing size

n

=1 to

n

=100

Adapted from F Tekle

Relevance of Sample Size

36

Power Analysis

A-priori

•What do I need for my experiment and is it feasible?

•What sample size is needed to detect a difference at predefined effect size, anticipated variability

and with a certain probability, given that this difference indeed exists?

•What sample size is needed to detect a difference at predefined effect size, anticipated variability

and with a certain probability, given that this difference indeed exists?

•Will I be able to run this experiment?

•Can I get enough samples?

•Can I test all the samples?

•Can I afford the experiment?

•If not, does it still make sense to run the experiment?

Post-hoc

•How reliable are the data (reported)?

•What was the power of a reported study, given the observed effect size, variability and number of

samples reported?

•Can I “trust” the data?

Underpowered Studies – Still an Issue 5 Years Later

Efficacy of Analgetics

 Effect of drug interventions in models of chemotherapy-induced peripheral neuropathy

 Systematic search, inclusion of 341 publications by Nov 2015

Outcome measure Model (examples) Number

comparisons

Post-hoc

power

Evoked limb withdrawal to

mechanical stimuli

e.g., electronic "von Frey", mechanical monofilament, pin

prick, pinch test

648

0.06

Evoked limb withdrawal or

vocalisation to pressure

e.g., Randall-Selitto paw pressure 235

0.07

Evoked limb withdrawal to

cold

e.g., acetone/ethylchloride/menthol, cold plate, cold water 251

0.06

Evoked limb withdrawal to

heat

e.g., radiant heat, hot plate, paw immersion 140

0.07

Evoked tail withdrawal to cold e.g., tail immersion 50

0.06

Evoked tail withdrawal to heat e.g., tail flick, tail immersion 38

0.19

Complex behaviour, pain-

behavior, burrowing activity, CPP, thermal place preference

12

0.10

unpublished data, based on Currie et al., bioRxiv, 2018, http://dx.doi.org/10.1101/293480, courtesy of Ezgi

Tanriver-Ayder, Gillian L. Currie, Emily Sena

38

More Likely for a Research Claim to be

False Than True

 A research finding is less likely to be true in case of

– Small studies

– Small effect sizes

– Many statistical comparisons, less pre-defined criteria

– Greater flexibility in designs, definitions, outcomes, and analytical modes

39

Example: Phenotypic Screen

Test population

10,000 compounds from

compound library

Real effect in 0.1%

(10 compounds

active)

No effect in 99.9%

(9990 compounds

inactive)

Power = 0.2

20%

(2 compounds) will

show as true

positives

80%

(8 compounds) will

show as false

negatives

95% (9490.5

compounds) will

show as true

negatives

5% (499.5

compounds) will

show as false

positives

Significance level = 0.05

What is the probability that there is no efficacy even if a “significant” result has been obtained?

FDR =

False Positives ( )

Modified from Colquhoun (2015) R Soc Open Sci 1: 140216

P(real) = 0.001 (1/1000 cpds have an effect)

Assumptions:

• Pre-existing compound

library

• On average, 1/1000

compounds will be active

False Positives ( )

40

Example: Phenotypic Screen

Test population

10,000 compounds from

compound library

P(real) = 0.001 (1/1000 cpds have an effect)

Real effect in 0.1%

(10 compounds

active)

No effect in 99.9%

(9990 compounds

inactive)

Power = 0.2

20%

(2 compounds) will

show as true

positives

80%

(8 compounds) will

show as false

negatives

95% (9490.5

compounds) will

show as true

negatives

5% (499.5

compounds) will

show as false

positives

Significance level = 0.05

What is the probability that there is no efficacy even if a “significant” result has been obtained?

FDR =

499.5

/

All Positives ( )

Assumptions:

• Pre-existing compound

library

• On average, 1/1000

compounds will be active

Modified from Colquhoun (2015) R Soc Open Sci 1: 140216

+

499.5

All Positives ( )

False Positives ( )

41

Example: Phenotypic Screen

Test population

10,000 compounds from

compound library

P(real) = 0.001 (1/1000 cpds have an effect)

Real effect in 0.1%

(10 compounds

active

No effect in 99.9%

(9990 compounds

inactive)

20%

(2 compounds) will

show as true

positives

80%

(8 compounds) will

show as false

negatives

95% (9490.5

compounds) will

show as true

negatives

5% (499.5

compounds) will

show as false

positives

Significance level = 0.05

What is the probability that there is no efficacy even if a “significant” result has been obtained?

FDR =

499.5

2

= 0.99.6 (99.6%)

→ > 99% of compounds are falsely detected as “active”

Assumptions:

• Pre-existing compound

library

• On average, 1/1000

compounds will be active

Power = 0.2

/

Modified from Colquhoun (2015) R Soc Open Sci 1: 140216

All Positives ( )

10

+

False Positives ( )

42

Making Statistics More Robust

Test population

10,000 compounds from

compound library

P(real) = 0.001 (1/1000 cpds have an effect)

Real effect in 0.1%

(10 compounds

active

No effect in 99.9%

(9990 compounds

inactive)

80%

(8 compounds) will

show as true

positives

20%

(2 compounds) will

show as false

negatives

99.9% (9980

compounds) will

show as true

negatives

0.1% (10

compounds) will

show as false

positives

Significance level = 0.001

What is the probability that there is no efficacy even if a “significant” result has been obtained?

FDR =

10

8

= 0.55 (55%)

→ 55% of compounds are falsely detected as “active”

Assumptions:

• Power increased to 0.8

• α set to 0.001

Power = 0.8

/

Modified from Colquhoun (2015) R Soc Open Sci 1: 140216

43

Example Cell Culture

 Cells suspended and pipetted into wells of microtiter plate, treatment

randomized to wells, replicates on multiple days

→ EU: the well, not the individual cells (possible)

→ the replicates, not the wells (better!)

could be well plates run on one day would be subsamples

improves robustness / consistency / generalizability

becomes a scientific judgement!

Day 1

Day 2

• Experimental material (cells) are artificially

homogenous

• Experimental conditions very narrowly

defined

44

Multiple Interventions 1

 Example 1: New drug against vehicle and multiple active comparators

 Q: do drugs differ from vehicle and when compared to each other?

 Multiple (

m

) interventions, independent samples, multiple comparisons (comparing more then 2 means)

 Stats: curve fitting if possible (e.g., to calculate and compare EC

50

’s)

– often not possible →ANOVA (overall effect), post-hoc test (pairwise comparisons of all conditions with

– each other)

 Sample size calculation similar to unpaired T-test

▪ Take desired effect size

▪ Calculate population variance

▪ Use Z-values for T-test

Balanced design, comparisons estimated best if same number of

samples are allocated per group

45

Multiple Interventions 2

 Example 2: Dose-response study with planned comparisons

 Q: is drug X active when compared to vehicle?



 Multiple (

m

) interventions , independent samples, 1 control, pairwise comparisons

 Stats: ANOVA (overall effect), planned comparisons (individual drug doses vs. vehicle)

 Sample size calculation similar to unpaired T-test

 But: Number of samples in control group should be √

m

– times the number of samples

in intervention groups

(Bate & Karp, PLOS One, 2014)

N

c

= √

m ∗ n

m

with

N

c

= adjusted control group size

m

= number of interventions (e.g. drug doses tested)

n

m

= calculated sample size

Unbalanced design, gains sensitivity at cost of multiple comparisons

46

Multiple Interventions 3

 Example 3: Comparison of 2 drugs, all samples receive all treatments

 Q: is one drug better than the other drug?

 Multiple (

m

) interventions , dependent samples

 Stats: Cross-over design, repeated measures ANOVA

 Sample size calculation uses same formula as for unpaired T-test

 But: Variability used is the within sample standard deviation σ

w

σ

w

= √(WMSE), with WMSE = within mean square error from the ANOVA table

47

Special Cases and Additional Requirements

 Large sample size due to high → consider covariates

 variability → consider blocking factors

 Anticipated dropouts → add samples

 Studies designed to show lack of → need high power and

 effect (zero-data)? variability (95% CI) covers

effect size too small to be

considered relevant

 Other statistical approaches → ask your statistician!

48

How to Deal with…

 High variability?

– Consider using covariates

Y= βX

1

+ ε

with Y= measurement, X1 = explanatory variable and ε = error term

Y= β

0

+βX

1

+ βX

2

+ ... +ε

49

How to Deal with…

 High variability?

– Consider blocking factors

 A variable that has an effect on an experimental outcome, but is itself of no interest

 Age, gender, experimenter, time of day, batches…

– Use block designs to reduce unexplained variability

– Key concept: variability within each block is less than the variability of the

entire sample → increasing efficiency to estimate an effect

Treatments

Block 2 Block 1

Experimental units in one block

are more ‘alike’ than others

50

How to Deal with…

 Anticipated dropouts?

 Consider need for additional samples!

– E.g. x% reduced viability of offspring from mutant mouse line

N

1

=

n

/(1- (

z

/100))

with

N

1

= adjusted sample size

n

= calculated sample size

z

= anticipated dropout rate (%)

51

How to Deal with…

 Studies designed to show lack of effect (zero-data)?

– Example: Attempt to confirm published work

 Minimize false negatives! → Power >> .80!

 Demonstrate result is conclusive! → Show effect size is less than

originally reported

Modified from: Bespalov et al., unpublished

A non-significant result may or may not be conclusive

-

d

min

+

d

min

if 95% confidence interval

overlaps with important effect we

cannot exclude possibility that

there is an important impact

groups not different, but also not

equivalent

groups not difference and equivalent

What is convincingly negative?

High research rigor

• Robust and unbiased experimental design, methodology, analysis, interpretation, and reporting

• Authors of original paper consulted (if possible)

Properly validated methods

No evidence for technical failure

Converging evidence on multiple readouts (if possible)

Ideally multi-lab collaborative effort

Adequate data analysis methods establish the observed results as statistically negative

• No experiment can provide an absolute proof of absence of an effect, i.e., p > 0.05 does not proof negative results

• Often more rigorous design and stronger statistical power required than in the original report (power of 0.2 is not sufficient!)

• Confidence intervals should be narrow and cover only those effect sizes that aren’t considered scientifically relevant

Full access to methods and raw data are provided

Results likely to have a significant impact if published

52

53

Choosing the Appropriate Power

Calculation

https://eda.nc3rs.org.uk/experimental-

design-group

54

Additional Strategies to Increase Power

 Use as few treatment groups as possible

 Investigate only main effects rather then interactions

 Use direct rather than indirect dependent variables

 Use sensitive measures

 Use reliable measures

 Use covariates and/or blocking variables

 Use cross-over designs