Rules of thumb for minimum sample size for multiple regression

Within the context of a research proposal in the social sciences, I was asked the following question:

I have always gone by 100 + m (where m is the number of predictors) when determining minimum sample size for multiple regression. Is this appropriate?

I get similar questions a lot, often with different rules of thumb. I've also read such rules of thumb quite a lot in various textbooks. I sometimes wonder whether popularity of a rule in terms of citations is based on how low the standard is set. However, I'm also aware of the value of good heuristics in simplifying decision making.

Questions:

What is the utility of simple rules of thumb for minimum sample sizes within the context of applied researchers designing research studies?
Would you suggest an alternative rule of thumb for minimum sample size for multiple regression?
Alternatively, what alternative strategies would you suggest for determining minimum sample size for multiple regression? In particular, it would be good if value is assigned to the degree to which any strategy can readily be applied by a non-statistician.

regression
sample-size
statistical-power
rule-of-thumb

asked Apr 28, 2011 at 6:40 Jeromy Anglim Jeromy Anglim 45.5k 24 24 gold badges 155 155 silver badges 257 257 bronze badges

7 Answers 7

$\begingroup$

I'm not a fan of simple formulas for generating minimum sample sizes. At the very least, any formula should consider effect size and the questions of interest. And the difference between either side of a cut-off is minimal.

Sample size as optimisation problem

Bigger samples are better.
Sample size is often determined by pragmatic considerations.
Sample size should be seen as one consideration in an optimisation problem where the cost in time, money, effort, and so on of obtaining additional participants is weighed against the benefits of having additional participants.

A Rough Rule of Thumb

In terms of very rough rules of thumb within the typical context of observational psychological studies involving things like ability tests, attitude scales, personality measures, and so forth, I sometimes think of:

n=100 as adequate
n=200 as good
n=400+ as great

These rules of thumb are grounded in the 95% confidence intervals associated with correlations at these respective levels and the degree of precision that I'd like to theoretically understand the relations of interest. However, it is only a heuristic.

G Power 3

I typically use G-Power 3 to calculate power based on various assumptions see my post.
See this tutorial from the G Power 3 site specific to multiple regression
The Power Primer is also a useful tool for applied researchers.

Multiple Regression tests multiple hypotheses

Any power analysis question requires consideration of effect sizes.
Power analysis for multiple regression is made more complicated by the fact that there are multiple effects including the overall r-squared and one for each individual coefficient. Furthermore, most studies include more than one multiple regression. For me, this is further reason to rely more on general heuristics, and thinking about the minimal effect size that you want to detect.
In relation to multiple regression, I'll often think more in terms of the degree of precision in estimating the underlying correlation matrix.

Accuracy in Parameter Estimation

I also like Ken Kelley and colleagues' discussion of Accuracy in Parameter Estimation.

See Ken Kelley's website for publications
As mentioned by @Dmitrij, Kelley and Maxwell (2003) FREE PDF have a useful article.
Ken Kelley developed the MBESS package in R to perform analyses relating sample size to precision in parameter estimation.

answered Apr 28, 2011 at 16:03 Jeromy Anglim Jeromy Anglim 45.5k 24 24 gold badges 155 155 silver badges 257 257 bronze badges $\begingroup$

I don't prefer to think of this as a power issue, but rather ask the question "how large should $n$ be so that the apparent $R^2$ can be trusted"? One way to approach that is to consider the ratio or difference between $R^2$ and $R_^$, the latter being the adjusted $R^2$ given by $1 - (1 - R^)\frac$ and forming a more unbiased estimate of "true" $R^2$.

Some R code can be used to solve for the factor of $p$ that $n-1$ should be such that $R_^$ is only a factor $k$ smaller than $R^2$ or is only smaller by $k$.

require(Hmisc) dop par(mfrow=c(1,2)) dop(c(.9, .95, .975), 'relative') dop(c(.075, .05, .04, .025, .02, .01), 'absolute')

enter image description here

Legend: Degradation in $R^$ that achieves a relative drop from $R^$ to $R^_$ by a the indicated relative factor (left panel, 3 factors) or absolute difference (right panel, 6 decrements).

If anyone has seen this already in print please let me know.

answered May 15, 2013 at 14:34 Frank Harrell Frank Harrell 96.4k 6 6 gold badges 189 189 silver badges 436 436 bronze badges

$\begingroup$ +1. I suspect I'm missing something rather fundamental & obvious, but why should we use the ability of $\hat R^2$ to estimate $R^2$ as the criterion? We already have access to $R^2_$, even if $N$ is low. Is there a way to explain why this is the right way to think about the minimally adequate $N$ outside of the fact that it makes $\hat R^2$ a better estimate of $R^2$? $\endgroup$

Commented May 15, 2013 at 14:53

$\begingroup$ @FrankHarrell: look here the author seems to be using the plots 260-263 in much the same way as the ones in your post above. $\endgroup$

Commented May 15, 2013 at 17:35

$\begingroup$ Thanks for the reference. @gung that's a good question. One (weak) answer is that in some types of models we don't have an $R^<2>_$, and we also don't have an adjusted index if any variable selection has been done. But the main idea is that if $R^2$ is unbiased, other indexes of predictive discrimination such as rank correlation measures are likely to be unbiased also due to adequacy of the sample size and minimum overfitting. $\endgroup$

Commented May 16, 2013 at 13:46 $\begingroup$

(+1) for indeed a crucial, in my opinion, question.

In macro-econometrics you usually have much smaller sample sizes than in micro, financial or sociological experiments. A researcher feels quite well when on can provide at least feasible estimations. My personal least possible rule of thumb is $4\cdot m$ ($4$ degrees of freedom on one estimated parameter). In other applied fields of studies you usually are more lucky with data (if it is not too expensive, just collect more data points) and you may ask what is the optimal size of a sample (not just minimum value for such). The latter issue comes from the fact that more low quality (noisy) data is not better than smaller sample of high quality ones.

Most of the sample sizes are linked to the power of tests for the hypothesis you are going to test after you fit the multiple regression model.

There is a nice calculator that could be useful for multiple regression models and some formula behind the scenes. I think such a-priory calculator could be easily applied by non-statistician.

Probably K.Kelley and S.E.Maxwell article may be useful to answer the other questions, but I need more time first to study the problem.

answered Apr 28, 2011 at 12:11 Dmitrij Celov Dmitrij Celov 6,335 2 2 gold badges 30 30 silver badges 41 41 bronze badges Commented Sep 7, 2022 at 18:44 $\begingroup$

Your rule of thumb is not particularly good if $m$ is very large. Take $m=500$: your rule says its ok to fit $500$ variables with only $600$ observations. I hardly think so!

For multiple regression, you have some theory to suggest a minimum sample size. If you are going to be using ordinary least squares, then one of the assumptions you require is that the "true residuals" be independent. Now when you fit a least squares model to $m$ variables, you are imposing $m+1$ linear constraints on your empirical residuals (given by the least squares or "normal" equations). This implies that the empirical residuals are not independent - once we know $n-m-1$ of them, the remaining $m+1$ can be deduced, where $n$ is the sample size. So we have a violation of this assumption. Now the order of the dependence is $O\left(\frac\right)$. Hence if you choose $n=k(m+1)$ for some number $k$, then the order is given by $O\left(\frac\right)$. So by choosing $k$, you are choosing how much dependence you are willing to tolerate. I choose $k$ in much the same way you do for applying the "central limit theorem" - $10-20$ is good, and we have the "stats counting" rule $30\equiv\infty$ (i.e. the statistician's counting system is $1,2,\dots,26,27,28,29,\infty$).