Article Text
Abstract
Objectives To illustrate the pitfalls of using metaanalysis to combine estimates of effect in trials that are highly varied and have a high potential for bias.
Methods We used a randomeffects metaanalysis to pool the results of 51 shamcontrolled acupuncture trials of chronic pain published in English before 2008 and explored the heterogeneity using metaregression. We repeated the process on a subset of these trials that used a visually credible nonpenetrating sham device as control (N = 12).
Results In both analyses there were high levels of heterogeneity and many studies were at risk from potential bias. The heterogeneity was not explained by metaregression.
Conclusions Trials of interventions that have high potential for bias, such as many in the acupuncture literature, do not meet the assumptions of the statistical procedure that underlie randomeffects metaanalysis. Even in the absence of bias, heterogeneity in metaanalyses is not accounted for by the CIs around the pooled estimate.
 Acupuncture
 Systematic Reviews
 Statistics & Research Methods
Statistics from Altmetric.com
Introduction
Trials of complex (multicomponent) interventions like acupuncture1 have factors that may influence the underlying true effect of the treatment relative to trials with different designs. Such components include population, acupuncture ‘dose’, acupuncturists, condition severity, comparator intervention and analysis methods. Many systematic reviews of acupuncture contain metaanalyses providing a pooled estimate of effect. In a metaanalysis, variability between trials not due to sampling variation is termed heterogeneity (table 1).
Heterogeneity can complicate the interpretation of a metaanalysis because the observed differences in effect between studies could also be due to trial level methodological or clinical variation in addition to the intervention effect.2 ,3 If a review team decides that metaanalysis is appropriate for a particular set of trials, there are two main types of statistical methods that can be used: fixedeffect and randomeffects (table 1). Fixedeffect modelling is used where there is no heterogeneity or when there is thought to be a common underlying effect to the treatment. However, the absence of heterogeneity is difficult to substantiate and there is often considerable variation between acupuncture trials, meaning that it may be hard to justify the assumption of a common effect in the presence of heterogeneity (table 1). Randomeffects methods can be used where there is observed or expected heterogeneity. Although a complex range of factors should be considered by reviewers as they decide whether to report the results of fixed or randomeffects modelling for a particular set of trials,4 it is common to see statements in systematic reviews such as “We used a randomeffects model for our metaanalysis to account for the observed heterogeneity”. Such statements imply that problems caused by heterogeneity are solved by using a randomeffects model. There are, however, several assumptions that need to be met for a randomeffects model to be valid, and in this paper we discuss the limitations of the randomeffects model in the context of the difficulty in meeting these assumptions.
The randomeffects method assumes that, due to variation in design, some studies may produce large effects and others small effects; crucially, there is no assumption of a common treatment effect across studies. Statistically, study effect is adjusted for the betweenstudy variability in the sample, meaning that, in the absence of bias but in the presence of heterogeneity, the pooled estimate is the average effect based on the distribution of the observed trials and not the common effect of the treatment.2 ,3 ,5
A key feature of a randomeffects metaanalysis is that the distribution of betweenstudy variance should be random—that is, the studies should be a random sample of the relevant distribution of treatment effects, and these study estimates should be free from bias.3 Any bias that has influenced the effect of a particular study will also be classified as betweenstudy variation and bias that occurs systematically may potentially amplify the pooled effect.6 ,7 Randomeffects models also give more weight to smaller studies compared with a fixedeffect calculation6 ,7 (table 1), increasing the impact of the small studies on the pooled estimate. If the results of the small studies are systematically biased, as some methodologists have observed,8 the model assumption of unbiasedness is not met and the randomeffects modelling will amplify the effect of the bias.7
Clearly, as the methodology of randomeffects assumes unbiasedness of estimates, the identification of studies affected by bias is critical. Assessment of bias is usually by classification of characteristics thought to influence the outcomes of a trial such as the security of the random allocation from subversion, quantity and characteristics of study dropouts, analysis of the data in the groups to which individuals were originally allocated regardless of what they actually received, and blinding of the person collecting the outcome data to the group allocation. However, although there is consensus that bias leads to effects that are more likely to be in favour of the intervention, there is uncertainty over how much such factors actually influence the outcomes of a trial. Methodological studies have shown conflicting results in this regard, and the level of influence may vary by trial or by groups of trials.9–11 Systematic reviewers take different approaches to whether they pool data based on the results of their bias assessment. Some might decide not to conduct a metaanalysis at all, or only pool studies that have a ‘low risk of bias’, or pool only those meeting some predefined criteria such as reporting that randomisation was adequately concealed. Others might pool all studies regardless of bias assessment or conduct a sensitivity analysis on these results. Including trials that are affected by bias violates the assumption underlying the statistical procedure of randomeffects, rendering it an inappropriate method. Further complications arise when bias in a trial remains undetected.
A method that can be used to explore the effect of bias—or other factors thought to be common to a subset of trials and causing heterogeneity on the pooled estimate—is metaregression (table 1). Metaregression can only be carried out on a randomeffects metaanalysis. Here, heterogeneity caused by a particular component can be quantified and its effect on the pooled estimate observed. In practice, exploration of the causes of heterogeneity in a metaanalysis are limited by requirements of a sample numbering at least 10 trials per covariate examined and uncertainty over which factors in a complex intervention are likely to explain the variation.
To summarise, trials of complex interventions such as acupuncture are likely to give rise to heterogeneity in metaanalysis. A common method used in such cases is randomeffects metaanalysis, which provides an average of estimates across the trials. For this method to be valid, the underlying effects of the trials must be drawn from a normal distribution and trials must be free from systematic effects of bias. Interpretation of range of the average effect is recommended through the use of predictive intervals, which more accurately represents this variability.
The goal of this paper is to illustrate some of the problems facing systematic reviewers of acupuncture trials when trying to meet the assumptions of a randomeffects metaanalysis by interpreting the results of a heterogeneous metaanalysis of acupuncture versus sham and attempting to examine the underlying causes of heterogeneity.
Methods
Sample inclusion and exclusion criteria
The search strategy, methods and definitions are those used in a larger project and have been published in detail elsewhere.22 ,23 For this illustrative analysis we use randomised controlled trials (RCTs) for the treatment of any medical or psychological condition in adults by acupuncture published by November 2007 in English. We did not contact authors for clarification of data.
Full papers were independently screened by two people for having one or more arms that used a sham acupuncture control, which we defined as a device used to mimic an acupuncture needle and believed by the investigator to be inferior to acupuncture. We included only RCTs of musculoskeletal or neuropathic ‘chronic’ (≥2 weeks in duration) pain. We pooled trials reporting continuous pain outcomes as these were more numerous than dichotomous outcomes. Extensive study data pertaining to the description of the trial, quality criteria, methods and outcomes were extracted by one author and checked by another, resolving discrepancies by discussion.
Metaanalysis
A randomeffects metaanalysis 18 of the first pain outcome taken after the end of treatment was conducted using the metan module of Stata V.12 (StataCorp, Texas, USA).24 Because the outcomes were in a range of different units, Hedge's adjusted g was applied to calculate the standardised mean difference (SMD) for each trial.16 CIs for the pooled estimate were calculated using standard methods.16
Heterogeneity was quantified using I^{2} and the CI calculated using a noncentral χ^{2} approach14 using the heterogi module of Stata.25
Metaregression
We specified five factors (covariates) we thought might be causing heterogeneity a priori and explored their association with effect size and heterogeneity through metaregression.21 Two were methodological factors (pain location, sham needle type), two were quality factors (allocation concealment, outcome assessor blinding) and one could be considered both a methodological and a quality factor (size of the trial, defined as having provided a sample size estimate and recruiting that target sample size).
We used the iterative restricted maximum likelihood estimator for r^{2} (ι^{2})26 and a conservative estimator for the variances of the effect estimates as implemented in the metareg module in Stata.21 To avoid problems in model estimation caused by small sample sizes, a minimum of 10 trials were required in the metaanalysis for each covariate in the model.
The proportion of heterogeneity explained by the covariate R^{2}_{ADJ}—that is, the relative reduction in betweenstudy variance21—was presented along with the twotailed p value (α=0.05). The proportion of residual heterogeneity after adjustment with the covariate not due to sampling variation was presented as I^{2}_{RES}. As an interpretative note, it is possible for R^{2}_{ADJ} to be negative; this simply indicates that the proportion explained is no greater than what would be expected by chance.21
Results
Fiftyone of 85 shamcontrolled RCTs identified met the inclusion criteria. Because we do not aim to provide a systematic review of effectiveness, we have anonymised the trials and have not provided details of included or excluded studies. However, our comprehensive search strategy leads us to believe that the 85 trials represent the vast majority of English language trials with our selected characteristics published prior to 2008.
Description of the full sample
Most trials (77%) had an outcome assessor that was blinded but 75% did not report adequate allocation concealment and only 29% reported recruiting an adequate sample (see online supplementary table A1). Real needles were the most often used sham control (35/51, 69%), with a wide variety of depths and locations reported (see online supplementary table A2).
Metaanalysis of the full sample
Due to the level of potential bias we would not recommend pooling these trials as the unbiasedness assumptions for randomeffects metaanalysis are not met. However, we continue for illustrative purposes. There is a large variation in study effects (see online supplementary figure A1). The average effect of acupuncture is −0.53 (95% CI −0.70 to −0.36). Interpreting this estimate alone, we might conclude that acupuncture is superior to sham. However, the CIs do not account for the betweenstudy variance and there is substantial heterogeneity (81%, 95% CI 76% to 85%).
Metaregression of the full sample
We now apply our metaregression using the five preselected covariates. None of the tested covariates explain a significant proportion of the heterogeneity (all p values are >0.05; see online supplementary table A3). The most explained was just 6.5% (adequacy of sample size), with studies inadequately powered showing an average effect of −0.72 compared with −0.23 for those adequately powered. The high I^{2}_{RES} indicates that the heterogeneity remains largely unexplained, and there will be many other differences in the design, conduct and analysis that could explain some of the heterogeneity.
Our illustrative research question “Is acupuncture more effective than sham for treatment of pain lasting ≥2 weeks?” is really too broad to be clinically useful, and most metaanalyses would focus on a question that has tighter inclusion criteria. Table 2 lists some plausible metaanalysis questions that could be asked in systematic reviews of effectiveness and the number of studies from our sample meeting the criteria for each question.
We can see that the number of trials available to analyse very quickly reduces once we apply additional selection criteria. Even in these selected samples there is still high heterogeneity (table 2). As randomeffects metaanalyses are not recommended for fewer than four studies,3 this underscores the difficulty of having a numerous sample of acupuncture trials on which to conduct metaanalysis. For example, the 12 trials included for Question 1 (table 2) would reduce again once we selected out trials we considered were given an inadequate ‘dose’ of acupuncture, those with a flexible protocol of treatment or those that used electroacupuncture.
For this worked example we now conduct a metaanalysis of Question 1—that is, trials that used a visually credible nonpenetrating sham device.
Description of selected sample
Table 3 reveals at least two characteristics of concern for the validity of a randomeffects metaanalysis; half the studies did not report adequate allocation concealment and half of them may be inadequately powered. Both of these characteristics have been associated with potential systematic bias.8 ,27 ,28 If these factors are systematically associated with study effect, then the assumptions underlying the use of randomeffects metaanalysis do not hold and inferences made from the applications of such a model in these circumstances cannot be trusted. The other characteristics indicate several other possible causes of heterogeneity, and the reader should bear in mind that only a small selection of the methodological and clinical variation that could potentially contribute to heterogeneity are presented.
Metaanalysis of selected sample
On metaanalysis, the pooled estimate is −0.19 and the CI is −0.47 to 0.08; as these limits cross zero, we conclude that there is little evidence of an effect of acupuncture over sham in this sample of trials (see online supplementary figure A2). However, most of the trials that do not report adequate allocation concealment (except study 37) have a point estimate that favours acupuncture, as do studies with smaller weights in the analysis (except study 6). This raises our suspicions that there may be an underlying systematic effect associated with larger effects seen for smaller trials and those with inadequate allocation concealment. These would invalidate the assumptions underlying a randomeffects model. Both size and allocation concealment would seem to be good candidates for further investigation; however, we are faced with only limited options. (1) Remove these trials from the analysis, leaving only larger trials reporting concealed allocation, but how should we define ‘small’ and ‘large’ trials? (2) Respecify the metaanalysis to weight trials in the analysis by these criteria, knowing this method may introduce yet more bias. (3) Conduct a metaregression to examine whether trial size and allocation concealment account for the heterogeneity, but with a sample size of 12, we should use just one covariate.
Metaregression of selected sample
We tested the covariate of reporting allocation concealment in a metaregression (table 4). The negative value of R^{2}_{ADJ} indicates that this covariate does not explain any more heterogeneity than would be expected by chance. We conclude that there are other factors at work that account for the observed differences in effects between the trials, but we cannot explore them because their ratio to the number of trials is too low. We cannot rule out that the sample is biased, but we cannot further explore the cause of heterogeneity without data dredging or excluding studies because their estimates ‘look different’ to other trials with similar characteristics (here study 6 and 34).
Discussion
In this worked example with real acupuncture trial data, we have demonstrated that meeting the assumptions that underlie a randomeffects metaanalysis can be severely hampered by a small number of trials, high heterogeneity and high potential for bias. Moreover, we illustrate that randomeffects metaanalysis does not solve the problem of pooling clinically and methodologically varied trials.
Due to their inherent variation in population, intervention and setting, it can be difficult to measure the true underlying effect in complex intervention trials such as acupuncture with metaanalysis. Given the limitations of interpretation of randomeffects models as presented in this paper, authors’ urge to ‘come up with a number’ should be resisted in some cases and authors could instead focus on providing detailed narrative reviews that describe which interventions have shown to be promising, for whom, and under what circumstances.29
As in other areas, systematic reviewers wishing to estimate any underlying effect by pooling acupuncture trials are hamstrung in their efforts by the complexity of the intervention and the relatively small number of trials available. Pooling just a few trials rules out exploring the causes of heterogeneity. Prespecifying which trials are to be pooled is preferable, but randomeffects metaanalysis is not advised for fewer than four trials and assessment of heterogeneity is impossible in a small sample. To further add to the difficulty of assessing whether a trial or group of trials is ‘biased’, various dimensions of bias appear to have different effects in individual trials and across different systematic reviews.9–11
Conclusions
Authors of systematic reviews of acupuncture should apply caution before using metaanalysis; in particular, the potential effect of bias should be seriously considered. The exploration of heterogeneity using metaregression is unlikely to explain the variation present. Caution should be applied when interpreting CIs because they do not reflect the variation between trials.
Summary points

Acupuncture studies are often heterogeneous, so systematic reviewers often choose the random effects model for metaanalysis.

Random effects model rests on certain assumptions, particularly lack of systematic bias.

In a sample of typical acupuncture study reports, we found these assumptions were not met.
References
Review history and Supplementary material
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
 Data supplement 1  Online supplement
 Data supplement 2  Online tables
Footnotes

Contributors SLP designed and analysed the study, which was advised upon by JB, SC and HM. JB also advised on and contributed to the screening and general data extraction. The authors thank Gillian Worthy, Alison Longridge, Laura Vanderbloemen and Ann Hopton who contributed to the screening and general data extraction.

Competing interests None.

Funding The Foundation for Traditional Chinese Medicine financially contributed towards the project.

Patient consent Obtained.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement Data are available on request.
Request Permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.