StarStat: Frequently Asked Questions (FAQs) about significance testing
What is “Statistical Significance”?
A sample of observations, such as responses to a questionnaire, will generally not reflect exactly the same population from which it is drawn. As long as there is variation among individuals, sample results will be affected by the particular mix of individuals chosen. “Sampling error” (or conversely, “sample precision”) refers to the amount of variation likely to exists between a sample result and the actual population.”
Confidence level” qualifies a statistical statement by expressing the probability that the observed result cannot be explained by sampling error alone. To say that the observed result is significant at the 95% confidence level is to say that there is a 95% chance that the difference is real and not just a quirk of the sampling.”
Sampling error” and “confidence levels” work hand-in-hand. A larger difference may be significant at a lower confidence level. For example, we might be able to state that we are 95% confident that a sample result falls within a certain range of the true population level. However, we can be 90% confident that our sample result falls within some broader range of the population level.
When comparing two observed survey results, the underlying principles are the same. The key question of whether a difference is “significant” can be restated as “is the difference enough to allow for normal sampling error?”. This is the question that z-tests and t-tests can provide helpful data on.
How large a sample is enough?
To determine appropriate sample size, it is necessary to consider the maximum sample error we are willing to accept, as well as the confidence level desired. Different research requires different degrees of reliability or “precision”, depending on the specific objectives and possible consequences of the survey findings.
Often, an “acceptable” level of error used by survey researchers is between 5% and 10% at the 95% confidence level.
Another factor in determining sample size is the number of subgroups to be analyzed. A researcher will want to be sure that the smallest subgroup will be large enough to insure reliable results.
The sample precision calculator can provide data to help a researcher make an informed decision regarding sample reliability.
Is a 1- or 2-tailed test appropriate?
A one-tailed test is appropriate when a directional difference is implied in the hypothesis being tested (i.e., “Group A will score higher than Group B”). A two-tailed test, on the other hand, tests the hypothesis that the two groups are “different”, regardless of the direction of the difference (i.e., “Group A and Group B perform differently” on a certain measure).
While StarStat provides both 1- and 2-tailed probabilities, significance test results are displayed based on 2-tailed probabilities.
Documentation and original code copyright 1995-2022 by DataStar, Inc. (East Weymouth, MA). Reproduction of material for non-commercial purposes is permitted, without charge, provided that suitable reference is made to StarStat and DataStar, Inc. by including this notice intact. Neither StarStat nor its documentation should be modified in any way without written permission from DataStar, Inc. StarStat was previously known as Starware/Stat.
StarStat is provided “as is” without warranty of any kind. The entire risk as to the quality, performance, and fitness for intended purpose is with you. You assume responsibility for the selection of the test and for the use of results obtained.
StarStat credits the following sources for formulas used:
- Huntsberger, D. V. and Billingsley, P. (1975), “Statistical Inference for Management and Economics”, Allyn and Bacon, Inc., Boston.
- Hastings, C. (1955), “Approximations for Digital Computers”, Princeton University Press, Princeton, NJ.
- Matre, J. and Gilbreath, G. (1987), “Statistics for Business and Economics”, 3rd Edition, Business Publications, Inc., Plano, TX.
- Reisman, Joel (1994), “Polynomial Approximation to t-Distribution”