Article Text

PDF

The impact of STRICTA and CONSORT on reporting of randomised control trials of acupuncture: a systematic methodological evaluation
  1. Simen Svenkerud,
  2. Hugh MacPherson
  1. Department of Health Sciences, Faculty of Science, University of York, York, UK
  1. Correspondence to Simen Svenkerud, Department of Health Scienses, University of York, York YO10 5DD, UK; ss1996{at}york.ac.uk

Abstract

Background Clear and unambiguous reporting is essential for researchers and clinicians to be able to assess the quality of research. To enhance the quality of reporting, consensus-based reporting guidelines are commonly used.

Objectives To update and extend previous research by evaluating the more recent impact of STRICTA (STandards for Reporting Interventions in Controlled Trials of Acupuncture) and CONSORT (CONsolidated Standards Of Reporting Trials) guidelines on the quality of reporting of acupuncture trials.

Methods By random sampling, approximately 45 trials from each of five 2-year time periods between 1994 and 2015 were included in the study. Using scoring sheets based on the STRICTA and CONSORT checklist items (range 0 to 7 and 0 to 5, respectively), the distribution of items reported over time was investigated, with changes shown using scatterplots. The primary analysis used a before-and-after t-test to compare time periods. A meta-analysis investigated whether or not trials published in journals that endorsed STRICTA were associated with better reporting.

Results The study included 207 trials. Improved reporting of items over time was observed, as represented by changes in the scatterplot slope and intercept. The mean STRICTA score increased from 4.27 in the 1994–1995 period to 5.53 in 2014–2015, an 18% improvement. The mean CONSORT score rose from 1.01 in the 1994–1995 period to 3.32 in 2014–2015, an increment of 46%. There was proportionately lower reporting for items related to practitioner background (STRICTA) and for randomisation implementation and allocation concealment (CONSORT). Trials published in journals that endorsed STRICTA had statistically significantly superior reporting of both STRICTA and CONSORT items overall.

Conclusion This study has provided evidence of an improvement in reporting of STRICTA and CONSORT items over the time period from 1994 to 2015. Journals that endorse STRICTA have a better record in terms of reporting quality. Some evidence suggests that the publication of STRICTA has had a positive impact on reporting quality.

  • acupuncture

Statistics from Altmetric.com

Introduction

When Begg et al published the first edition of the ‘CONsolidated Standards Of Reporting Trials’ (CONSORT) guidelines, their aim was to improve reporting quality in randomised control trials by providing a checklist for authors to follow when writing up their articles for publication.1 CONSORT has since been updated in 2001 and 2010.2 3 Today CONSORT is widely accepted as standard for reporting of clinical trials. A series of studies have been conducted to assess the impact of CONSORT in general medical fields, which show that CONSORT has positively impacted reporting quality and research quality.4–8

The ‘STandards for Reporting Interventions in Controlled Trials of Acupuncture’ (STRICTA) guidelines were first published in 20019 and a revised version was published in 2010.10 The aim of STRICTA is to improve reporting of the interventions within clinical trials of acupuncture, thereby extending the CONSORT guidelines. An early review that evaluated the impact of STRICTA on acupuncture trial reporting concluded that there had been a statistically significant increase in the reporting of CONSORT items between 1996 and 2005, but over the same period there was no statistical evidence for an improvement in STRICTA reporting.11

The aim of this specific study was to update and expand upon previous research by Prady et al 11 by investigating the impact of STRICTA and CONSORT on the quality of reporting of acupuncture research. Our objective was to compile a larger dataset over a longer time period and conduct a higher level statistical analysis in order to improve and extend previous research.

Methods

The methodological basis of the study was a systematic review, however the area of interest was the methodology of the published articles rather than a specific condition. The gold standard for a systematic review was adjusted accordingly to allow for this focus. This study can be seen as an expansion of previous research by Prady et al.11

Time periods

As a baseline, we chose to collect data over the 2-year time period 1994–1995 as this was before the publication of CONSORT and STRICTA. We also collected data from a further four distinct time periods, namely 1999–2000, 2004–2005, 2009–2010, and 2014–2015. These periods can be interpreted in the context of the publication dates of both STRICTA (in 2001 and 2010) and CONSORT (in 1996, 2001 and 2010). For comparability between this study and the one by Prady et al,11 we used the same inclusion/exclusion criteria.

Screening process and study selection

Our initial search found 12 514 potential acupuncture trials (figure 1). After manually removing duplicates, and applying time filters, we then used random selection (with the aim of achieving a pre-defined sample of 45 randomised control trials in each time period) and screening for inclusion and exclusion criteria. In order to prevent a high attrition rate resulting from the screening process, as was observed by Prady et al,11 it was decided that we would continue randomly selecting and screening articles until the full sample size had been achieved. Additionally, in the event that trials had to be excluded after the full text had been obtained, these were replaced using the same random allocation procedure as applied during initial screening. This procedure ensured a fully random final sample.

Figure 1

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart showing the flow of articles through the study.

Sample size calculations

We assumed that the initial differences in STRICTA scores between time periods and related standard deviations (SDs) found by Prady et al 11 (13.5 and 22.4, respectively) represented a reasonable estimate for all time periods. For the group comparisons, at a significance level of 0.05 with 80% statistical power, we estimated that we would require a sample size per time period of n=45 papers, that is, n=225 papers overall. Due to limited resources available to the project, and the high level of comparability, we requested and received permission to include data from Prady et al 11 in our analysis.

Primary outcomes

Outcome measures for this study were the item scores using the STRICTA and CONSORT rating scales based on the checklists used by Prady et al.11 The primary outcomes for the study were the STRICTA and CONSORT checklist item scores over time for acupuncture trials as an indicator of the impact of publication and revision of the STRICTA and CONSORT guidelines. Each item on the STRICTA and CONSORT checklists was re-framed as a question or series of sub-questions (see tables 1 and 2 for exact formulations). Each item was given equal weight, such that the item’s question(s) contributed a score between 0 and 1 that, when totalled, was within the range 0 to 7 and 0 to 5 points for STRICTA and CONSORT, respectively.

Table 1

STRICTA item distribution

Table 2

CONSORT item distribution

Statistical analysis

We used the statistical analysis software package SPSS for Windows (2013; IBM-Corp, Armonk, NY, USA) to identify a random selection of articles. Using Stata 14 (2015; StataCorp, College Station, TX, USA), we performed t-tests to compare total STRICTA scores and combined STRICTA+CONSORT scores between time periods. The data for the CONSORT scores were foundnot to meet the requirements for a parametric analysis and thus were analysed using the Wilcoxon rank sum test instead. We examined for a statistically significant change between adjacent time periods, as well as between the period before first publication and the most recent time period.

Secondary analyses included the distribution of individual STRICTA and CONSORT items (using percentages) over time, a simple regression of item scores against time, and a meta-analysis of the effect of endorsing STRICTA and/or CONSORT on adherence to item reporting.

Results

Study characteristics

After removal of duplicates, 4285 articles matched the search criteria. Through random sampling, 1052 articles were screened for inclusion, and 207 articles were included in the final analysis. See figure 1 for our PRISMA flowchart, and table 3 for relevant study characteristics. Note should be made that for 1994–1995 we only achieved a sample size of 26 papers. The reason for this was that our search only identified 26 studies eligible for inclusion within this time period.

Table 3

Characteristics of included studies

Overall STRICTA and CONSORT total scores over time

We observed a rise in the mean total STRICTA score (see table 2) from 4.27 in the 1994–1995 period to 5.53 in 2014–2015, which represents a 1.25 point improvement (18%). The mean CONSORT score rose from 1.01 in the 1994–1995 period to 3.32 in 2014–2015, an increment of 2.31 points (46%). The overall score when STRICTA and CONSORT were combined as a single value rose from 5.29 in the 1994–1995 period to 8.85 in 2014–2015, an improvement of 3.56 points (30%).

When these data were presented graphically, we saw an overall trend towards increased quality of reporting following the publication of STRICTA (figure 2A). The red line, as a simple regression line, shows the trend in the data across all time periods, ignoring the gaps between them. The slope was relatively unchanged over the time period before the publication of STRICTA in 2001. By the 2004–2005 time period there was evidence of increased reporting quality according to the STRICTA scores, a trend that continued through to 2014–2015. A trend towards improvement was observed in both the CONSORT scores (figure 2B) and the combined STRICTA and CONSORT scores (figure 2C).

Figure 2

Distribution of item reporting scores over time. All plots show an increase in reporting quality over time. (A) STRICTA item reporting score. (B) CONSORT item reporting score. (C) combined STRICTA+CONSORT item reporting score. Solid red line=fitted least squares for aid of visualisation of data.

Reporting of individual STRICTA and CONSORT items over time

Looking at the overall trends for STRICTA, we saw that, for all items except 2, 5, and 18, the highest reporting scores were in the more recent periods of either 2009–2010 or 2014–2015 (table 2). In 2014–2015 the best-reported items, adhered to in over 90% of all included articles, were frequency and number of treatments as well as the description of point locations (items 4, 12, 13, 21, 29 and 30). The area with the weakest reporting was practitioner information (items 15–17).

There was a relatively steady increase in the reporting of most CONSORT items throughout most of the time periods (table 3). Although the majority of CONSORT items were reported in over 75% of included studies in 2014–2015, the poorest reporting related to information on allocation concealment and implementation—which was only reported in 51.1% (concealment; item 2.2) and 53.3% (implementation; item 2.1)—and related to ‘intention-to-treat’ (37.78%; item 5).

Before-and-after analysis of STRICTA and CONSORT

We observed statistically significant results for the change in STRICTA scores (table 4) for all time periods compared, with the exception of 1994–1995 versus 1999–2000 (P=0.91) and 1999–2000 versus 2004–2005 (P=0.2). Looking at the overall change from the earliest period (1994–1995) to the most recent time period (2014–2015) we found an overall statistically significant (P<0.001) improvement of 1.25 points (95% CI 1.72 to 0.84).

Table 4

Before-and-after t-test results for STRICTA scores

For CONSORT, Wilcoxon rank sum tests only provided P values (table 5). Statistically significant results were found for 1999–2000 versus 2004–2005 (P<0.001), while the remaining adjacent time periods showed no statistically significant differences. A statistically significant (P<0.001) improvement was found when comparing baseline (1994–1995) against the data from 2004 to 2005 and 2014–2015. For combined STRICTA and CONSORT scores (table 6), we found an overall change of 3.55 (95% CI 4.32 to 2.78).

Table 5

Wilcoxon rank sum test results for CONSORT scores

Table 6

Before and after t-test results for STRICTA and CONSORT

Meta-analysis of the effect of STRICTA endorsement

When comparing the relative risk of reporting quality in journals that endorsed the use of STRICTA with those that did not (figure 3), we found that overall there was more complete reporting in journals endorsing STRICTA. The overall relative risk of reporting completeness was 1.27 (95% CI 1.18 to 1.36). Looking at the individual items, the point estimates for a few items showed that they were better reported in non-endorsing journals (items 1.2, 1.3, 2.1, 6.3). By contrast, several items were significantly better reported in journals endorsing STRICTA (items 2.8, 3.1, 4, 6.12, 6.13, and 6.15).

Figure 3

Meta-analysis of the effect of endorsement of STRICTA. The pooled estimate of likelihood of an item being reported clearly favours the paper being published in a STRICTA endorsing journal. This indicates that papers published in STRICTA endorsing journals are more likely to have higher reporting quality than papers reported in non-endorsing journals.

As observed for the STRICTA items, figure 4 shows a similar trend, in that CONSORT items appeared to be more completely reported in journals that endorsed the use of STRICTA compared with non-endorsing journals. The overall estimate of reporting completeness was 1.36 (CI 1.25 to 1.55). CIs of items 1, 3.2, 3.3 showed that they were significantly better reported in STRICTA-endorsing journals.

Figure 4

Meta-analysis on effect of endorsement of STRICTA on CONSORT reporting. The pooled likelihood of a CONSORT item being reported favours STRICTA endorsing journals; although not as clear as for STRICTA items it remains a statistically significant difference.

Discussion

Principal findings

We found statistical evidence for an improvement of reporting of both STRICTA and CONSORT checklist items over time. The meta-analysis provided evidence that endorsement of STRICTA by journals is associated with improved reporting not only of STRICTA items but also of CONSORT items. Our evidence suggests that STRICTA and its subsequent update has had an impact on reporting quality.

With respect to individual item reporting, we found that the highest reported STRICTA items were frequency and duration of treatment, as well as point descriptions, which were all better reported in later years. The weakest reported items were related to practitioner information, which displayed little change from baseline. The lowest reported CONSORT items were related to randomisation implementation, allocation concealment and intention-to-treat.

Our findings in context

We note that our study provides fresh evidence that updates the earlier study by Prady et al,11 who found little evidence to support the claim that the reporting of STRICTA items had improved by 2005, some 4 years after STRICTA was first published. A study published in 2011 investigated the reporting quality of trials using a tool they developed on the basis of STRICTA and CONSORT, named the Oregon-STRICTA-CONSORT-Index (OSCI).12 Possibly due to the larger sample size and extended time period, they observed a statistically significant rise in the STRICTA score from 1997 to 2007.12 Our study, which is an update and extension of Prady et al 11 that includes subsequent time periods to 2015, larger sample size and additional analyses, supports the case that it might take between 5 and 10 years for reporting guidelines to become more widely adopted within the research community.

Studies on the impact of STRICTA and CONSORT on acupuncture trials published in the Chinese and Korean languages13 14 have shown that there has been an increase in reporting quality over time. A review of the impact of STRICTA found an increase in the citations of STRICTA over time, with publication date being a significant predictor of STRICTA citation.15 Kim et al 16 investigated the level of reporting of selected STRICTA items in systematic Cochrane reviews compared with reporting within the constituent randomised controlled trials (RCTs). They found that the level of reporting was statistically significantly lower in the systematic reviews, and argued for better reporting quality within such reviews.

Strength and limitations

The primary strength of our study lies in the extended time span as well as a large sample size. This allowed us to investigate and observe a broader picture compared with previous studies. Furthermore, we used assessment tools that have previously been tested, although not statistically validated. Limitations of the study include the fact that the STRICTA and CONSORT checklists were never intended to be used as rating scales of research reporting quality.17 The original intent was to provide guidance on what items should be reported when writing up RCTs for publication. It is further debatable as to the value of the weighting chosen in this study, which was based on that used by Prady et al,11 whereby individual checklist items were all equally weighted. We suggest that some items are clearly more important than others. Likewise Hammerschlag et al 12 also used an equal weighting system.

A further weakness includes the fact that data extraction was performed by only one investigator, due to limited resources. Language restrictions were applied, such that all included studies were published in English. There is, however, little reason to believe that this would have introduced excessive systematic bias, as similar reviews performed on non-English studies13 14 18 are consistent with our own results. This study did not assess the impact of unknown potential confounding factors on the question of causality, which could have led to quantification of the extent by which STRICTA has influenced the improvement in the quality of reporting.

Implications for research and practice

While we endorse the practice of regular revisions of reporting guidelines in order that they are adapted to the changing environment of acupuncture research, Liu et al 19 have suggested that it might be time to consider alternative routes to improving reporting quality. They reported limited adherence to reporting of STRICTA items among journals that have endorsed STRICTA. One possible way forward would be to better support journals and editors in order to enhance reporting of items, with editors engaging more specifically with authors on their reporting of STRICTA items at the submission stage.

In the section on strengths and limitations above, we discussed the issue that STRICTA has not been validated as a quality scoring tool, regardless of the fact that several studies have used it in that way. The development of an appropriately weighted and properly validated scoring tool to assess the quality of reporting of acupuncture research is needed.

During the conduct of this study, we found that the use of before-and-after tests to assess changes in levels of reporting presents a series of challenges. These challenges include lack of control over confounding factors, natural changes in reporting trends, and causality of impact. From this, we recommend future research to adopt an interrupted time series design (ITSA) to address these challenges. Computation using ITSA requires continuous data, thereby avoiding any null periods, as was the case in the study reported here. This approach would require a larger sample size of trials to be evaluated.

Conclusion

We found a statistically significant improvement in reporting quality of both STRICTA and CONSORT checklist items from 1994 to 2015. Journal endorsement of STRICTA is associated with better reporting. By combining the data across analyses, we provided some evidence to suggest that the publication of STRICTA has had a positive impact on reporting quality. We are cautious regarding this causal relationship, due to the possible influence of confounding factors. Future research on the impact of reporting guidelines would benefit from using an ITSA in order to be able to explore causality. We endorse the practice of regular revisions of reporting guidelines in order to adapt to the ever changing environment of acupuncture research.

Acknowledgments

We would like to acknowledge the assistance of Dr Stephanie Prady in allowing us to use the data from her study. We also would like to thank Professor David Torgerson and Dr Mona Kanaan for their valuable advice and guidance during the execution of this study.

References

View Abstract

Footnotes

  • Contributors SS designed the study, collected and analysed the data, interpreted the results and drafted the manuscript. HM supervised and guided all stages of the project, interpreted the results and contributed to the write-up. All authors read and approved the final version of the manuscript accepted for publication.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.