From Wikipedia, the free encyclopedia  View original article
In statistics, metaanalysis comprises statistical methods for contrasting and combining results from different studies in the hope of identifying patterns among study results, sources of disagreement among those results, or other interesting relationships that may come to light in the context of multiple studies.^{[1]} Metaanalysis can be thought of as "conducting research about previous research." In its simplest form, metaanalysis is done by identifying a common statistical measure that is shared between studies, such as effect size or pvalue, and calculating a weighted average of that common measure. This weighting is usually related to the sample sizes of the individual studies, although it can also include other factors, such as study quality.
The motivation of a metaanalysis is to aggregate information in order to achieve a higher statistical power for the measure of interest, as opposed to a less precise measure derived from a single study. In performing a metaanalysis, an investigator must make choices many of which can affect its results, including deciding how to search for studies, selecting studies based on a set of objective criteria, dealing with incomplete data, analyzing the data, and accounting for or choosing not to account for publication bias. ^{[2]}
Metaanalyses are often, but not always, important components of a systematic review procedure. For instance, a metaanalysis may be conducted on several clinical trials of a medical treatment, in an effort to obtain a better understanding of how well the treatment works. Here it is convenient to follow the terminology used by the Cochrane Collaboration,^{[3]} and use "metaanalysis" to refer to statistical methods of combining evidence, leaving other aspects of 'research synthesis' or 'evidence synthesis', such as combining information from qualitative studies, for the more general context of systematic reviews.
The historical roots of metaanalysis can be traced back to 17th century studies of astronomy, while a paper published in 1904 by the statistician Karl Pearson in the British Medical Journal^{[4]} which collated data from several studies of typhoid inoculation is seen as the first time a metaanalytic approach was used to aggregate the outcomes of multiple clinical studies.^{[5]}^{[6]} The first metaanalysis of all conceptually identical experiments concerning a particular research issue, and conducted by independent researchers, has been identified as the 1940 booklength publication Extrasensory Perception After Sixty Years, authored by Duke University psychologists J. G. Pratt, J. B. Rhine, and associates.^{[7]} This encompassed a review of 145 reports on ESP experiments published from 1882 to 1939, and included an estimate of the influence of unpublished papers on the overall effect (the filedrawer problem). Although metaanalysis is widely used in epidemiology and evidencebased medicine today, a metaanalysis of a medical treatment was not published until 1955. In the 1970s, more sophisticated analytical techniques were introduced in educational research, starting with the work of Gene V. Glass, Frank L. Schmidt and John E. Hunter.
The term "metaanalysis" was coined by Gene V. Glass,^{[8]} who was the first modern statistician to formalize the use of the term metaanalysis. He states "my major interest currently is in what we have come to call ...the metaanalysis of research. The term is a bit grand, but it is precise and apt ... Metaanalysis refers to the analysis of analyses". Although this led to him being widely recognized as the modern founder of the method, the methodology behind what he termed "metaanalysis" predates his work by several decades.^{[9]}^{[10]} The statistical theory surrounding metaanalysis was greatly advanced by the work of Nambury S. Raju, Larry V. Hedges, Harris Cooper, Ingram Olkin, John E. Hunter, Jacob Cohen, Thomas C. Chalmers, Robert Rosenthal and Frank L. Schmidt.
Conceptually, a metaanalysis uses a statistical approach to combine the results from multiple studies in an effort to increase power (over individual studies), improve estimates of the size of the effect and/or to resolve uncertainty when reports disagree. Basically, it produces a weighted average of the included study results and this approach has several advantages:
A metaanalysis of several small studies does not predict the results of a single large study.^{[11]} Some have argued that a weakness of the method is that sources of bias are not controlled by the method: a good metaanalysis of badly designed studies will still result in bad statistics.^{[12]} This would mean that only methodologically sound studies should be included in a metaanalysis, a practice called 'best evidence synthesis'.^{[12]} Other metaanalysts would include weaker studies, and add a studylevel predictor variable that reflects the methodological quality of the studies to examine the effect of study quality on the effect size.^{[13]} However, others have argued that a better approach is to preserve information about the variance in the study sample, casting as wide a net as possible, and that methodological selection criteria introduce unwanted subjectivity, defeating the purpose of the approach.^{[14]}
Another potential pitfall is the reliance on the available corpus of published studies, which may create exaggerated outcomes due to publication bias, as studies which show negative results or insignificant results are less likely to be published. For example, one may have overlooked dissertation studies or studies that have never been published. This is not easily solved, as one cannot know how many studies have gone unreported.^{[15]}
This file drawer problem results in the distribution of effect sizes that are biased, skewed or completely cut off, creating a serious base rate fallacy, in which the significance of the published studies is overestimated, as other studies were either not submitted for publication or were rejected. This should be seriously considered when interpreting the outcomes of a metaanalysis.^{[15]}^{[16]}
The distribution of effect sizes can be visualized with a funnel plot which is a scatter plot of sample size and effect sizes. In fact, for a certain effect level, the smaller the study, the higher is the probability to find it by chance. At the same time, the higher the effect level, the lower is the probability that a larger study can result in that positive result by chance. If many negative studies were not published, the remained positive studies give rise to a funnel plot in which effect size is inversely proportional to sample size, in other words: the higher the effect size, the smaller the sample size. An important part of the shown effect is then due to chance that is not balanced in the plot because of unpublished negative data absence. In contrast, when most studies were published, the effect shown has no reason to be biased by the study size, so a symmetric funnel plot results. So, if no publication bias is present, one would expect that there is no relation between sample size and effect size.^{[17]} A negative relation between sample size and effect size would imply that studies that found significant effects were more likely to be published and/or to be submitted for publication. There are several procedures available that attempt to correct for the file drawer problem, once identified, such as guessing at the cut off part of the distribution of study effects.
Methods for detecting publication bias have been controversial as they typically have low power for detection of bias, but also may create false positives under some circumstances.^{[18]} For instance small study effects, wherein methodological differences between smaller and larger studies exist, may cause differences in effect sizes between studies that resemble publication bias.^{[clarification needed]} However, small study effects may be just as problematic for the interpretation of metaanalyses, and the imperative is on metaanalytic authors to investigate potential sources of bias. A Tandem Method for analyzing publication bias has been suggested for cutting down false positive error problems.^{[19]} This Tandem method consists of three stages. Firstly, one calculates Orwin's failsafe N, to check how many studies should be added in order to reduce the test statistic to a trivial size. If this number of studies is larger than the number of studies used in the metaanalysis, it is a sign that there is no publication bias, as in that case, one needs a lot of studies to reduce the effect size. Secondly, one can do an Egger's regression test, which tests whether the funnel plot is symmetrical. As mentioned before: a symmetrical funnel plot is a sign that there is no publication bias, as the effect size and sample size are not dependent. Thirdly, one can do the trimandfill method, which imputes data if the funnel plot is asymmetrical. Important to note is that these are just a couple of methods that can be used, but several more exist.
Nevertheless, it is suggested that 25% of metaanalyses in the psychological sciences may have publication bias.^{[19]} However, low power problems likely remain at issue, and estimations of publication bias may remain lower than the true amount.
Most discussions of publication bias focus on journal practices favoring publication of statistically significant finds. However, questionable research practices, such as reworking statistical models until significance is achieved, may also favor statistically significant findings in support of researchers' hypotheses^{[20]}^{[21]} Questionable researcher practices aren't necessarily sample size dependent, and as such are unlikely to be evident on a funnel plot and may go undetected by most publication bias detection methods currently in use.
Other weaknesses are Simpson's paradox (two smaller studies may point in one direction, and the combination study in the opposite direction) and subjectivity in the coding of an effect or decisions about including or rejecting studies.^{[22]} There are two different ways to measure effect: correlation or standardized mean difference. The interpretation of effect size is arbitrary, and there is no universally agreed upon way to weigh the risk. It has not been determined if the statistically most accurate method for combining results is the fixed, random or quality effect models.^{[citation needed]}
The most severe fault in metaanalysis^{[23]} often occurs when the person or persons doing the metaanalysis have an economic, social, or political agenda such as the passage or defeat of legislation. People with these types of agendas may be more likely to abuse metaanalysis due to personal bias. For example, researchers favorable to the author's agenda are likely to have their studies cherrypicked while those not favorable will be ignored or labeled as "not credible". In addition, the favored authors may themselves be biased or paid to produce results that support their overall political, social, or economic goals in ways such as selecting small favorable data sets and not incorporating larger unfavorable data sets. The influence of such biases on the results of a metaanalysis is possible because the methodology of metaanalysis is highly malleable.^{[22]}
A 2011 study done to disclose possible conflicts of interests in underlying research studies used for medical metaanalyses reviewed 29 metaanalyses and found that conflicts of interests in the studies underlying the metaanalyses were rarely disclosed. The 29 metaanalyses included 11 from general medicine journals, 15 from specialty medicine journals, and three from the Cochrane Database of Systematic Reviews. The 29 metaanalyses reviewed a total of 509 randomized controlled trials (RCTs). Of these, 318 RCTs reported funding sources, with 219 (69%) receiving funding from industry^{[clarification needed]}. Of the 509 RCTs, 132 reported author conflict of interest disclosures, with 91 studies (69%) disclosing one or more authors having industry financial ties. The information was, however, seldom reflected in the metaanalyses. Only two (7%) reported RCT funding sources and none reported RCT authorindustry ties. The authors concluded “without acknowledgment of COI due to industry funding or author industry financial ties from RCTs included in metaanalyses, readers’ understanding and appraisal of the evidence from the metaanalysis may be compromised.”^{[24]}
1. Formulation of the problem
2. Search of literature
3. Selection of studies ('incorporation criteria')
4. Decide which dependent variables or summary measures are allowed. For instance:
5. Selection of a metaregression statistical model: e.g. simple regression, fixedeffect metaregression or randomeffect metaregression. Metaregression is a tool used in metaanalysis to examine the impact of moderator variables on study effect size using regressionbased techniques. Metaregression is more effective at this task than are standard regression techniques.
For reporting guidelines, see the Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA) statement ^{[25]}
In general, two types of evidence can be distinguished when performing a metaanalysis: Individual Participant Data (IPD) and Aggregate Data (AD). Whereas IPD represents raw data as collected by the study centers, AD is more commonly available (e.g. from the literature) and typically represents summary estimates such as odds ratios or relative risks. This distinction has raised the needs for different metaanalytic methods when evidence synthesis is desired, and has led to the development of onestage and twostage methods. In onestage methods the IPD from all studies are modeled simultaneously whilst accounting for the clustering of participants within studies. Conversely, twostage methods synthesize the AD from each study and hereto consider study weights. By reducing IPD to AD, twostage methods can also be applied when IPD is available; this makes them an appealing choice when performing a metaanalysis. Although it is conventionally believed that onestage and twostage methods yield similar results, recent studies have shown that they may occasionally lead to different conclusions.^{[26]}
The fixed effect model provides a weighted average of a series of study estimates. The inverse of the estimates' variance is commonly used as study weight, such that larger studies tend to contribute more than smaller studies to the weighted average. Consequently, when studies within a metaanalysis are dominated by a very large study, the findings from smaller studies are practically ignored.^{[27]} Most importantly, the fixed effects model assumes that all included studies investigate the same population, use the same variable and outcome definitions, etc. This assumption is typically unrealistic as research is often prone to several sources of heterogeneity; e.g. treatment effects may differ according to locale, dosage levels, study conditions, ...
A common model used to synthesize heterogeneous research is the random effects model of metaanalysis. This is simply the weighted average of the effect sizes of a group of studies. The weight that is applied in this process of weighted averaging with a random effects metaanalysis is achieved in two steps:^{[28]}
This means that the greater this variability in effect sizes (otherwise known as heterogeneity), the greater the unweighting and this can reach a point when the random effects metaanalysis result becomes simply the unweighted average effect size across the studies. At the other extreme, when all effect sizes are similar (or variability does not exceed sampling error), no REVC is applied and the random effects metaanalysis defaults to simply a fixed effect metaanalysis (only inverse variance weighting).
The extent of this reversal is solely dependent on two factors:^{[29]}
Since neither of these factors automatically indicates a faulty larger study or more reliable smaller studies, the redistribution of weights under this model will not bear a relationship to what these studies actually might offer. Indeed, it has been demonstrated that redistribution of weights is simply in one direction from larger to smaller studies as heterogeneity increases until eventually all studies have equal weight and no more redistribution is possible.^{[29]} Another issue with the random effects model is that the most commonly used confidence intervals generally do not retain their coverage probability above the specified nominal level and thus substantially underestimate the statistical error and are potentially overconfident in their conclusions.^{[30]}^{[31]} Several fixes have been suggested^{[32]}^{[33]} but the debate continues on.^{[31]}^{[34]} A further concern is that the average treatment effect can sometimes be even less conservative compared to the fixed effect model^{[35]} and therefore misleading in practice. One interpretational fix that has been suggested is to create a prediction interval around the random effects estimate to portray the range of possible effects in practice.^{[36]} However, an assumption behind the calculation of such a prediction interval is that trials are considered more or less homogeneous entities and that included patient populations and comparator treatments should be considered exchangeable^{[37]} and this is usually unattainable in practice.
The most widely used method to estimate between studies variance (REVC) is the DerSimonianLaird (DL) approach.^{[38]} More recently the iterative and computationally intensive restricted maximum likelihood (REML) approach emerged and is catching up. However, a comparison between these two (and more) models demonstrated that there is little to gain and DL is quite adequate in most scenarios.^{[39]} ^{[40]}
However, most metaanalyses include between 24 studies and such a sample is more often than not inadequate to accurately estimate heterogeneity. Thus it appears that in small metaanalyses, an incorrect zero between study variance estimate is obtained, leading to a false homogeneity assumption. Overall, it appears that heterogeneity is being consistently underestimated in metaanalyses and sensitivity analyses in which high heterogeneity levels are assumed could be informative.^{[41]} Numerous advanced randomeffects models are available in Stata with the metaan command.^{[42]} Most of these advanced methods have been implemented in a free and easy to use Microsoft Excel addon, MetaEasy. ^{[43]} ^{[44]} These random effects models and software packages relate to studyaggregate metaanalyses and researchers wishing to conduct individual patient data (IPD) metaanalyses need to consider mixedeffects modelling approaches.^{[45]}
Doi and Thalib originally introduced the quality effects model ^{[46]} They ^{[47]} introduce a new approach to adjustment for interstudy variability by incorporating a relevant component (quality) that differs between studies in addition to the weight based on the intrastudy differences that is used in any fixed effects metaanalysis model. The strength of the quality effects metaanalysis is that it allows available methodological evidence to be used over subjective random probability, and thereby helps to close the damaging gap which has opened up between methodology and statistics in clinical research. To do this a correction for the quality adjusted weight of the ith study called taui is introduced.^{[48]} This is a composite based on the quality of other studies except the study under consideration and is utilized to redistribute quality adjusted weights based on the quality adjusted weights of other studies. In other words, if study i is of good quality and other studies are of poor quality, a proportion of their quality adjusted weights is mathematically redistributed to study i giving it more weight towards the overall effect size. As studies increase in quality, redistribution becomes progressively less and ceases when all studies are of perfect quality. This model thus replaces the untenable interpretations that abound in the literature and a software is available to explore this method further ^{[49]}
Doi & Barendregt working in collaboration with Khan, Thalib and Williams (from the University of Queensland, University of Southern Queensland and Kuwait University), have created an inverse variance quasi likelihood based alternative (IVhet) to the random effects (RE) model for which details are available online.^{[50]} This was incorporated into MetaXL version 2.0,^{[49]} a free Microsoft excel addin for metaanalysis produced by Epigear International Pty Ltd, and made available on 5 April 2014. The authors state that a clear advantage of this model is that it resolves the two main problems of the random effects model. The first advantage of the IVhet model is that coverage remains at the nominal (usually 95%) level for the confidence interval unlike the random effects model which drops in coverage with increasing heterogeneity.^{[30]}^{[31]} The second advantage is that the IVhet model maintains the inverse variance weights of individual studies, unlike the RE model which gives small studies more weight (and therefore larger studies less) with increasing heterogeneity. When heterogeneity becomes large, the individual study weights under the RE model become equal and thus the RE model returns an arithmetic mean rather than a weighted average and this seems unjustified. This presumably unintended sideeffect of the RE model is avoided by the IVhet model which thus differs from the RE model estimate in two perspectives:^{[50]} Pooled estimates will favor larger trials (as opposed to penalizing larger trials in the RE model) and will have a confidence interval that remains within the nominal coverage under uncertainty (heterogeneity). Doi & Barendregt suggest that while the RE model provides an alternative method of pooling the study data, their simulation results (on the Epigear website^{[49]}) demonstrates that using a more specified probability model with untenable assumptions, as with the RE model, does not necessarily provide better results. Researchers can now access this new IVhet model through MetaXL^{[49]} for further evaluation and comparison with the conventional random effects model.
Modern statistical metaanalysis does more than just combine the effect sizes of a set of studies using a weighted average. It can test if the outcomes of studies show more variation than the variation that is expected because of the sampling of different numbers of research participants. Additionally, study characteristics such as measurement instrument used, population sampled, or aspects of the studies' design can be coded and used to reduce variance of the estimator (see statistical models above). Thus some methodological weaknesses in studies can be corrected statistically. Other uses of metaanalytic methods include the development of clinical prediction models, where metaanalysis may be used to combine data from different research centers,^{[51]} or even to aggregate existing prediction models.^{[52]}
Metaanalysis can be done with singlesubject design as well as group research designs. This is important because much research has been done with singlesubject research designs. Considerable dispute exists for the most appropriate metaanalytic technique for single subject research.^{[53]}
Metaanalysis leads to a shift of emphasis from single studies to multiple studies. It emphasizes the practical importance of the effect size instead of the statistical significance of individual studies. This shift in thinking has been termed "metaanalytic thinking". The results of a metaanalysis are often shown in a forest plot.
Results from studies are combined using different approaches. One approach frequently used in metaanalysis in health care research is termed 'inverse variance method'. The average effect size across all studies is computed as a weighted mean, whereby the weights are equal to the inverse variance of each studies' effect estimator. Larger studies and studies with less random variation are given greater weight than smaller studies. Other common approaches include the Mantel–Haenszel method^{[54]} and the Peto method.^{[citation needed]}
A recent approach to studying the influence that weighting schemes can have on results has been proposed through the construct of gravity,^{[clarification needed]} which is a special case of combinatorial metaanalysis.
Signed differential mapping is a statistical technique for metaanalyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM or PET.
Different high throughput techniques such as microarrays have been used to understand Gene expression. MicroRNA expression profiles have been used to identify differentially expressed microRNAs in particular cell or tissue type or disease conditions or to check the effect of a treatment. A metaanalysis of such expression profiles was performed to derive novel conclusions and to validate the known findings.^{[55]}
Wikiversity has learning materials about Metaanalysis 
