From Wikipedia, the free encyclopedia - View original article
|This article needs attention from an expert in statistics. (June 2012)|
|This article needs additional citations for verification. (September 2011)|
Statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance. The fundamental challenge of obtaining significance being that any partial picture of a given hypothesis, poll or question is almost always subject to random error. In statistical testing, a result is deemed statistically significant if a given result is so sufficiently extreme (without external variables which would influence the correlation results of the test) that the odds that that respective result could be expected to arise simply by chance only in rare circumstances. Hence the test provides enough evidence to reject the hypothesis of 'no effect'. When used in statistics, the word significant does not mean important or meaningful, as it does in everyday speech.
Researchers focusing solely on whether individual test results are significant or not may miss important response patterns which individually fall under the threshold set for tests of significance. Therefore along with tests of significance, it is preferable to examine effect-size statistics, which describe how large the effect is and the uncertainty around that estimate, so that the practical importance of the effect may be gauged by the reader.
The calculated statistical significance of a result is in principle only valid if the hypothesis was specified before any data were examined. If, instead, the hypothesis was specified after some of the data were examined, and specifically tuned to match the direction in which the early data appeared to point, the calculation would overestimate statistical significance.
An alternative (but nevertheless related) statistical hypothesis testing framework is the Neyman–Pearson frequentist school which requires both a null and an alternative hypothesis to be defined and investigates the repeat sampling properties of the procedure, i.e. the probability that a decision to reject the null hypothesis will be made when it is in fact true and should not have been rejected (this is called a "false positive" or Type I error) and the probability that a decision will be made to accept the null hypothesis when it is in fact false (Type II error). Fisherian p-values are philosophically different from Neyman–Pearson Type I errors. This confusion is unfortunately propagated by many statistics textbooks.
The significance level is usually denoted by the Greek symbol α (lowercase alpha). Popular levels of significance are 10% (0.1), 5% (0.05), 1% (0.01), 0.5% (0.005), and 0.1% (0.001). If a test of significance gives a p-value lower than the significance level α, the null hypothesis is rejected. Such results are informally referred to as 'statistically significant'. For example, if someone argues that "there's only one chance in a thousand this could have happened by coincidence", a 0.001 level of statistical significance is being implied. The lower the significance level chosen, the stronger the evidence required. The choice of significance level is somewhat arbitrary, but for many applications, a level of 5% is chosen by convention.
In some situations it is convenient to express the statistical significance as 1 − α. In general, when interpreting a stated significance, one must be careful to note what, precisely, is being tested statistically.
Different levels of α trade off countervailing effects. Smaller levels of α increase confidence in the determination of significance, but run an increased risk of failing to reject a false null hypothesis (a Type II error, or "false negative determination"), and so have less statistical power. The selection of the level α thus inevitably involves a compromise between significance and power, and consequently between the Type I error and the Type II error. More powerful experiments – usually experiments with more subjects or replications – can obviate this choice to an arbitrary degree.
Graphically, statistical significance is often indicated by the use of star symbols (*). The number of stars usually indicates the significance level: one star (*) for 0.05, two (**) for 0.01, and three (***) for 0.001 or 0.005. These star symbols may also be used on graphics, such as bar charts, to indicate a significant effect, such as a significant difference in the mean value between two populations (e.g. here).
In some fields, for example nuclear and particle physics, it is common to express statistical significance in units of the standard deviation σ of a normal distribution. A statistical significance of "" can be converted into a value of α by use of the cumulative distribution function Φ of the standard normal distribution, through the relation:
or via use of the error function:
Tabulated values of these functions are often found in statistics text books: see standard normal table. The use of σ implicitly assumes a normal distribution of measurement values. For example, if a theory predicts a parameter to have a value of, say, 109 ± 3, and one measures the parameter to be 100, then one might report the measurement as a "3σ deviation" from the theoretical prediction. In terms of α, this statement is equivalent to saying that "assuming the theory is true, the likelihood of obtaining the experimental result by coincidence is 0.27%" (since 1 − erf(3/√2) = 0.0027) (again depending on whether a one-tailed test or two-tailed test is appropriate).
Fixed significance levels such as those mentioned above may be regarded as useful in exploratory data analyses. However, modern practice is to quote the p-value explicitly, where the outcome of a test is essentially the final outcome of an experiment or other study. And, importantly, it should be stated whether the p-value is judged to be significant. This allows the maximum information to be transferred from a summary of the study into meta-analyses.
The scientific literature contains extensive discussion of the concept of statistical significance and in particular of its potential misuse and abuse.
Statistical significance can be considered to be the confidence one has in a given result. In a comparison study, it is dependent on the relative difference between the groups compared, the amount of measurement and the noise associated with the measurement. In other words, the confidence one has in a given result being non-random (i.e. it is not a consequence of chance) depends on the signal-to-noise ratio (SNR) and the sample size.
Expressed mathematically, the confidence that a result is not by random chance is given by the following formula by Sackett:
For clarity, the above formula is presented in tabular form below.
Dependence of confidence with noise, signal and sample size (tabular form)
|Parameter||Parameter increases||Parameter decreases|
|Noise||Confidence decreases||Confidence increases|
|Signal||Confidence increases||Confidence decreases|
|Sample size||Confidence increases||Confidence decreases|
In words, the dependence of confidence is high if the noise is low and/or the sample size is large and/or the effect size (signal) is large. The confidence of a result (and its associated confidence interval) is not dependent on effect size alone. If the sample size is large and the noise is low a small effect size can be measured with great confidence. Whether a small effect size is considered important is dependent on the context of the events compared.
In medicine, small effect sizes (reflected by small increases of risk) are often considered clinically relevant and are frequently used to guide treatment decisions if there is great confidence in them. Whether a given treatment is considered a worthy endeavour is dependent on the risks, benefits and costs.
Order refers to which comes first: the test data or the specification of the hypotheses to be tested. When the hypotheses come first the test is "prospective" and when the data come first the test is "retrospective". Traditionally, prospective tests have been required. However, there is a well-known generally accepted hypothesis test in which the data preceded the hypotheses. In that study the statistical significance was calculated the same as it would have been had the hypotheses preceded the data. A related question in use of statistics in the physical sciences is whether probability theory applies to the known past in the same way that it applies to the unknown future. Although these questions have been discussed, there are few references in this area of statistics. It hardly seems reasonable to accord the same status to a hypothesis that explains the results of an experiment after the results are known as to a hypothesis that predicts the results of an experiment before they are known. This is because it is well known that predicting an event before it occurs is more difficult than explaining it after it occurs.
|Wikiversity has learning materials about Statistical significance|