software is intended to be useful
in planning statistical studies. It is not intended to be
for analysis of data that have already been collected.
Each selection provides a graphical
interface for studying the power of one or more tests. They
(convertible to number-entry fields) for varying parameters, and a
provision for graphing one variable against another.
Each dialog window also
offers a Help
menu. Please read the Help menus before
contacting me with
The "Balanced ANOVA" selection
provides another dialog with a list of several popular experimental
a provision for specifying your own model.
The dialogs open in separate
windows. If you're
running this on an Apple Macintosh, the applets' menus are added to the
screen menubar -- so, for example,
two "Help" menus there!
may also downloadthis
software to run it
on your own PC.
These require a web
browser capable of running Java applets (version 1.1 or higher). If you
do not see a selection list above, chances are that you either have
disabled Java, your
browser is not new enough., or you need to download a JRE plug-in from java.sun.com.
Due to a
compatibility bug, many plug-ins size the applet window before allowing
for an additional strip with a security warning.; drag the bottom of
the window downward a bit to compensate.
If you use this software in preparing a research paper, grant proposal,
or other prublication, I would appreciate your acknowledging it by
citing it in the references. Here is a suggested bibliography
entry in APA or "author (date)" style:
Lenth, R. V.
(2006). Java Applets
for Power and Sample Size [Computer software]. Retrieved month
day, year, from
This form of the citation is appropriate whether you run it online
(give the date you ran it) or the stand-alone version (give the date
you downloaded it).
to run locally
The file piface.jar
downloaded so that you can run these applications locally. [Note: Some mail software
it is smarter than you) renames this file piface.zip.
If this happens, simply rename it piface.jar;
unzip the file.]
may also want the icon file piface.ico
if you put it on your desktop or a toolbar. You
will need to have the Java Runtime Environment (JRE) or the Java
Development Kit (JDK) installed on your system. You probably
already have it; but if not, these are available for free download for
several platforms from Sun.
you have JDK or JRE version 1.2 or later, then you can probably run the
application just by double-clicking on piface.jar.
you may run it from the command line in a terminal or DOS window, using
a command like
java -jar piface.jar
This will bring up a selector list similar to the one in this web
page. A particular dialog can also be run directly from the
command line, if you know its name (can be discovered by browsing piface.jar
with a zip file utility such as WinZip).
For example, the two-sample t-test
dialog may be run using
This software is made available as-is, with no guarantees; use it at
your own risk. I welcome comments on bugs, additional
capabilities you'd like to see, etc. I am also willing to
minimal support if you truly don't understand what inputs are
required. However, each applet has a help menu, and I do
that you carefully read that before you e-mail me with
If you need statistical advice on your research problem, you should
contact a statistical consultant; and if you want expert advice, you
should expect to pay for it. Most universities with
departments or statistics programs also offer a consulting
service. If you think your research is important, then it is
important to get good advice on the statistical design (i.e., before you start
If you have carefully
above two paragraphs, and still find it
appropriate to contact me, my e-mail address is firstname.lastname@example.org.
Here are two
very wrong things that people try to do with my software:
(a.k.a. observed power, post hoc power). You've got the data,
the analysis, and did not achieve "significance." So you
power retrospectively to see if the test was powerful enough or
not. This is an empty question. Of course it wasn't
powerful enough -- that's why the result isn't significant.
calculations are useful for design, not analysis.
(Note: These comments refer to power computed based on
observed effect size and sample size. Considering a different
sample size is obviously prospective in nature. Considering a
different effect size might make sense, but probably what you really
need to do instead is an equivalence test; see Hoenig and Heisey, 2001.)
T-shirt effect sizes
("small", "medium", and "large"). This is an elaborate way to
arrive at the same sample size that has been used in past social
science studies of large, medium, and small size
The method uses a standardized effect size as the goal. Think
about it: for a "medium" effect size, you'll choose the same n regardless of the
reliability of your instrument, or the narrowness or diversity of your
subjects. Clearly, important considerations are being ignored
here. "Medium" is definitely not the message!
Here are three
very right things you can do:
prospectively for planning future studies.
as is provided on this website is useful for determining an appropriate
sample size, or for evaluating a planned study to see if it is likely
to yield useful information.
before statistics. It is easy to get caught up
statistical significance and such; but studies should be designed to
meet scientific goals, and you need to keep those in sight at all times
(in planning and
analysis). The appropriate inputs to power/sample-size
calculations are effect sizes that are deemed clinically important,
based on careful considerations of the underlying scientific (not
statistical) goals of the study. Statistical considerations
used to identify a plan that is effective in meeting scientific goals
-- not the other way around.
Investigators tend to try to answer all the world's questions with one
study. However, you usually cannot do a definitive study in
step. It is far better to work incrementally. A
helps you establish procedures, understand and protect against things
that can go wrong, and obtain variance estimates needed in determining
sample size. A pilot study with 20-30 degrees of freedom for
error is generally quite adequate for obtaining reasonably reliable
Many funding agencies require a power/sample-size section in grant
proposals. Following the above guidelines is good for
your chances of being funded. You will have established that
have thought through the scientific issues, that your procedures are
sound, and that you have a defensible sample size based on realistic
variance estimates and scientifically tenable effect-size
To read more, please see the following references:
Lenth, R. V. (2001), ``Some Practical Guidelines for
Sample Size Determination,'' The American Statistician, 55,
Hoenig, John M. and Heisey, Dennis M. (2001), ``The Abuse
Power: The Pervasive Fallacy of Power Calculations for Data Analysis,''
The American Statistician, 55,
An earlier draft of the Lenth reference above is _here_,
and a shorter summary of some comments I made in a panel discussion at
the 2000 Joint Statistical Meetings in Indianapolis is _here_.
Additional brief comments, prepared as a handout for my
presentation at the 2001 Joint Statistical Meetings in Atlanta, are _here_.
Most computations are ``exact'' in the sense that they are based on
exact formulas for sample size, power, etc. The exception is
Satterthwaite approximations; see below.
Even with exact formulas, computed values are inexact, as are all
double-precision floating-point computations. Many
noncentral distributions) require summing one or more series, and there
is a serious tradeoff between speed and accuracy. The error
set for cdfs is 1E-8 or smaller, and for quantiles the bound is
Actual errors can be much larger due to accumulated errors or other
Quantiles, for example, are computed by numerically solving an equation
involving the cdf; thus, in extreme cases, a small error in the cdf can
create a large error in the quantile.
A warning (typically, ``too many iterations'') is generated when an
error bound is not detected to have been achieved. However,
the case of quantile computations, no warning message is generated for
extreme quantiles. If you want a power of .9999 at
you can expect the computed
sample size to not be accurate to the nearest
specify reasonable criteria, the answers will be pretty reliable.
Some of the dialogs (two-sample t, mixed ANOVA) implement Satterthwaite
approximations when certain combinations of inputs require an error
to be constructed. These are of course not exact, even in
formulation. Moreover, the Satterthwaite degrees of freedom
used as-is in computing power from a noncentral t or
noncentral F distribution, and this introduces
that could be large in some cases.
In the two-sample t setting, I'd expect the worst
when there is a huge imbalance in sample sizes and/or
the dialogs for mixed ANOVA models (either F tests
comparisons/contrasts), I expect these errors to get worse as more
variance components are involved, especially when one or more of them
is given negative weight.