Objective The authors investigated the quality of reporting for randomised controlled trials of auriculotherapy for pain before and after the implementation of the Consolidated Standards of Reporting Trials (CONSORT) and Standards for Reporting Interventions in Controlled Trials of Acupuncture (STRICTA) guidelines.
Methods The authors identified randomised controlled trials of auriculotherapy that measured pain or pain medication use as a primary outcome and were published in English in peer-reviewed journals. Proportions of studies that reported STRICTA and CONSORT items were compared for the years before and after implementation of STRICTA (2001) using Fisher's exact tests. Global differences across all study factors were investigated using hierarchical clustering and principle component analysis (PCA).
Results 15 studies met our inclusion criteria. On average, 11 studies (74%) reported STRICTA items and eight studies (54%) reported CONSORT items. Differences in reporting between pre and post-STRICTA studies were found for two CONSORT items (randomised sequence and treatment provider) but no STRICTA items. However, the results of cluster analysis and PCA detected global differences over time for both STRICTA and CONSORT items.
Conclusion Quality of reporting for studies of auriculotherapy for pain appears to have generally improved since the implementation of STRICTA and CONSORT guidelines.
Statistics from Altmetric.com
The quality of reporting for acupuncture studies has improved over time,1 2 especially since the implementation of the Standards for Reporting Interventions in Controlled Trials of Acupuncture (STRICTA) guidelines in 2001,3 intimating a concomitant improvement in the overall design and conduct of acupuncture clinical trials. Auriculotherapy is a related but distinct acupuncture field to traditional full-body acupuncture. Although we expect recent studies of auriculotherapy to follow similar trends in quality of reporting to full-body acupuncture, we are not aware of any studies that have examined the quality of reporting for trials of auriculotherapy.
We recently conducted a systematic review and meta-analysis of randomised controlled studies of auriculotherapy for treatment of pain.4 Seventeen studies met inclusion criteria and were included in the final publication. Using data extracted from the systematic review, we describe the quality of reporting for auriculotherapy trials. Since the included studies span across the years that the Consolidated Standards of Reporting Trials (CONSORT) guidelines and its extension for trials of acupuncture (STRICTA) were introduced, we examined how the implementation of these guidelines may have influenced the reporting of auriculotherapy trials.
We previously conducted a systematic literature review of randomised controlled studies of auriculotherapy for treatment of pain.4 Medline, Cochrane Database of Systematic Reviews, Cochrane Central Register of Controlled Trials, Allied and Complementary Medicine Database (AMED), ISI Web of Science, and Cumulative Index to Nursing and Allied Health Literature (CINAHL)were searched from inception through December 2008. Trials were included if they: (1) were randomised; (2) compared auriculotherapy to sham auriculotherapy, standard medical care or waiting-list control; (3) measured the effect on pain or medication use; (4) were published in English in a peer-reviewed journal. Details of our search strategy, selection process, extraction and quality rating methods are previously reported.4
For the systematic review, trained reviewers abstracted data from each study into a structured data abstraction form. A second reviewer read each abstracted article and evaluated the completeness of the data abstraction. For our current study, these data were entered into an excel spreadsheet by a trained data extractor and checked for accuracy by a second extractor. We used the STRICTA and CONSORT assessment checklists devised by Prady et al2 as guides for our data extraction (table 1). For elements of the checklists that were not included in our previous abstraction, we returned to the original publication to complete all elements of the checklists. Individual reporting items and categories were summarised as percentages of trials that reported the item.
We used a pre-post design to investigate possible changes in quality of reporting after implementation of the STRICTA guidelines. Since the STRICTA guidelines were published in 2001,3 5 we divided studies into those published prior to 2001 (early) and those published after 2003 (late). We chose to exclude studies published in the three years after dissemination of STRICTA to allow time for the research community to assimilate the guidelines.6 The CONSORT guidelines were initially published in 1996 with a revision published in 2001. Because all of the pre-STRICTA studies we identified were published prior to the initial dissemination of CONSORT, we included an analysis of reporting of CONSORT items based on the same division of studies as the STRICTA analysis.
The distribution for each of the items was compared between groups using Fisher's exact tests. Because the sample size was limited, parametric tests could not be performed. One-sided p values were calculated since the null hypothesis was that the appearance of the STRICTA guidelines did not improve these measures. In order to correct for multiple comparisons, a permutation test (with 1000 permuted datasets) was performed to empirically determine p value cut-offs for significance that corresponded to a family-wise error rate of 0.05.7 The permutation test empirically derived that an uncorrected p value of 0.01 was statistically significant. Means for each category were derived from the individual items of that category.
Global differences across all study items (both STRICTA and CONSORT) were investigated using two approaches. First, to visualise the similarity of studies across all items, unsupervised hierarchical clustering was performed using a complete-linkage distance metric.8 Hierarchical clustering assigns a set of observations into subsets (called clusters) so that observations in the same cluster are more similar (having shorter complete-linkage distances) than observations between adjacent clusters. For example, two studies that fulfilled all STRICTA and CONSORT items would have a very short linkage distance compared to two studies that differentially fulfilled some, but not all, items. For the current study, ‘bottom up’ agglomerative clustering was used such that each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Studies closer together in the dendrogram are therefore more alike than those further apart.
Secondly, these global patterns were more formally tested for differences between early and late studies using unscaled principle component analysis (PCA).9 PCA is used to develop a small number of summary variables (called principle components) that will account for most of the variance in the original set of observed variables. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Tracy–Widom tests were used to test the significance of the components from the PCA (to test that these components explain a significant amount of variation in the data).10 The results of a PCA are usually discussed in terms of component scores (eigenvalues) and loadings. Wilcoxon rank sum tests were used to compare eigenvalues from the significant components for early versus late studies. A loading plot (not shown) was created to determine which questions primarily accounted for global differences within each principle component. The loadings can be interpreted as the weights for each individual item when calculating the principal component, such that higher loading values represent a stronger contribution of the item to that component. Analyses were performed in Stata V.11 and JMP V.7.
Of the 17 studies from our previous systematic review, two were excluded since they were published in 2003. Publication dates ranged from 1986 to 2007; five studies (early group) were published prior to 2001 (range 1986–1994) and 10 (late group) were published after 2003 (range 2004–2007); no studies met our inclusion criteria for the years 1995–2001 and 2008. None of the studies were published in journals that have adopted the STRICTA guidelines.
Overall, the mean proportion of studies reporting STRICTA and CONSORT items was 74% and 54% respectively (see table 1; a version of the tableincluding raw study numbers is available online). For five of the six categories of STRICTA items, 70% or more of studies reported on that category; only the acupuncture practitioner details were reported less frequently (24%). Specifically, length of clinical experience was reported in 47% of the studies, while length of acupuncture training and study condition-specific expertise were each reported 13% of the time. Particularly low reporting was also noted for justification of the control procedure (25%). The mean proportions of studies reporting for each of the five CONSORT categories ranged from 33% (use of intention-to-treat analysis) to 80% (reporting of baseline characteristics of the study population).
Statistically significant differences between early and late studies were not found for any individual STRICTA items, but differences were noted for two CONSORT items: randomisation sequence generation and treatment provider blinding (see table); raw p values less than 0.01 were considered statistically significant. The overall difference in reporting between early and late studies for all STRICTA items was 58% versus 80% (p=0.18) and for all CONSORT items was 17% versus 72% (p=0.02).
However, as shown by the cluster analysis, global differences across all CONSORT and STRICTA items were seen based on time of study (early vs late) (see figure 1). The results of the cluster analysis are shown in a dendrogram, which lists all of the studies and indicates at what level of similarity any two clusters were joined. The dendrogram is read from right to left with vertical lines showing joined clusters. The position of the vertical line on the horizontal axis indicates the relative linkage distance. The observed distances between clusters are rescaled so that the actual distances are not shown; however, the ratio of the rescaled distances within the dendrogram is the same as the ratio of the original distances. Small distances between vertical lines indicate high similarity between studies, while large distances between vertical lines indicate less similarity (dissimilarity). Similarity is a measure of how alike each study fulfilled the STRICTA and CONSORT criteria. For interpretation, individual studies have been colour coded according to their timing (early vs late studies). These results demonstrate that generally, early studies cluster together, and late studies cluster together.
From the results of the PCA, a total of nine components explained most all of the variation in reporting. Of these nine, two components were significantly associated with the timing of the studies (early vs late) and explain a large amount of the variation in reporting (87%, p=0.006 and 4%, p=0.003) due to time.
When considering the loading plot from the PCA (available online), no single question appeared to have a strong influence on the primary PCA component, which is the component that explained 87% of the variation in timing. This is not surprising: given the small number of individual items showing significant differences (see table) we do not expect there to be a strong effect from any one single item, but strong global effects may still be possible. Questions on style of acupuncture (STRICTA item A1) and justification of control procedure (STRICTA item F3) appeared to drive most of the difference for the second, smaller component. Finally, although all five early studies used non-acupuncture controls, nine of the 10 late studies used acupuncture controls procedures.
The purpose of reporting guidelines such as CONSORT and STRICTA is to increase the transparency of study methods and ultimately improve the overall quality of research. It is known, for example, that inadequately generated or reported allocation concealment can bias the estimate of an intervention's effectiveness by as much as 30%.26 While CONSORT has clearly improved the reporting of studies within journals that have adopted those guidelines,6 27 the impact of the STRICTA guidelines on reporting of acupuncture trials is less well known.
Few studies have evaluated the effect the STRICTA and CONSORT guidelines have had on reporting of acupuncture trials, and no studies have reported on these effects for trials of auriculotherapy. Linde et al1 evaluated the quality of reporting of acupuncture trials for asthma and recurrent headache using the Jadad scale. The majority of trials had poor reporting or inadequate methods of randomisation, allocation concealment, blinding and dropouts. When considering factors associated with better reporting, they found that larger trials published more recently in journals indexed in Medline and in English were more likely to score higher than those without these characteristics. Similarly, Prady et al found that more recent studies of acupuncture tended to have better reporting than older studies.2
Our study found that overall, the quality of reporting was improved for studies published after 2003 compared to those published before 1994. However, there were very few individual guideline items that were clearly reported more frequently in the late study group compared to the early study group; this is most likely due to the low number of publications available for this study. The difference in early versus late reporting of many guideline items approached statistical significance; notable are several of the STRICTA items from the practitioner details and control intervention categories (where early studies did not report on certain items at all), and almost all of the CONSORT items.
While global differences in reporting are apparent for early versus late studies, it remains unclear whether these differences are more attributable to the influence of CONSORT compared to STRICTA. Indeed, the period between our two study groups included the implementation of both CONSORT and STRICTA. In the evaluation of acupuncture trials performed by Prady et al,2 the authors concluded that reporting of CONSORT items had improved after the introduction of CONSORT but that the introduction of STRICTA did not improve the reporting of STRICTA items. Our findings suggest that reporting of CONSORT items has improved more broadly than for STRICTA items, but that there may be some improvement in reporting of certain STRICTA items such as details of the acupuncture practitioner and the justification of control procedures. While the CONSORT statement was introduced five years earlier than STRICTA, some authors suggest that the impact of STRICTA has been attenuated because few acupuncture studies are published in STRICTA-adopting journals.28
The number of publications available for our analysis precluded definitive conclusions about the effects of STRICTA on quality of reporting because there were too few studies available to detect small or moderate differences; statistically significant results were only detected for differences greater than 70%. However, there were broad improvements in all CONSORT items and smaller improvements in most STRICTA items. Although significance was not reached for most items, the trend of the point estimates was consistently in the direction of effect; these results are reflected in the global assessment of differences, even though they are strictly undetectable as individual guideline items. Another drawback of this study is that four of the 10 late group studies were performed by the same research team. However, the results of the global assessment demonstrate that there are clear differences in reporting for almost all of the early versus late studies, so these results are not entirely driven by a single study team. Finally, the trials included in this study were derived from a single systematic review of auriculotherapy trials for pain: any biases in trial inclusion from the systematic review would be repeated in this study.
The quality of reporting for studies of auriculotherapy for pain appears to have improved since the introduction of the STRICTA and CONSORT guidelines. While the unique contribution of each of these guidelines on the quality of reporting is difficult to assess, it appears that reporting of CONSORT items has improved more than reporting of STRICTA items.
▶ The quality of reporting for auriculotherapy studies appears to have improved over time.
▶ It is not clear whether improvements in reporting are due to the implementation of the CONSORT or STRICTA guidelines.
The authors would like to especially thank Joshua Pathman for his participation in the preparation of the data for analysis.
Review history and Supplementary material
Funding This study was supported by a grant from the UNC Department of Family Medicine's Small Grants Program. GNA was supported by NIH/NCCAM grant T32AT003378.
Competing interests None.
Ethics approval This study was conducted with the approval of the IRB of the University of North Carolina at Chapel Hill.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.