As healthcare costs rise, physicians and other stakeholders are now seeking innovative and effective ways to reduce the provision of low-value services.1,2 The Choosing Wisely® campaign aims to further this goal by promoting lists of specific procedures, tests, and treatments that providers should avoid in selected clinical settings.3 On February 21, 2013, the Society of Hospital Medicine (SHM) released 2 Choosing Wisely® lists consisting of adult and pediatric services that are seen as costly to consumers and to the healthcare system, but which are often nonbeneficial or even harmful.4,5 A total of 80 physician and nurse specialty societies have joined in submitting additional lists.
Despite the growing enthusiasm for this effort, questions remain regarding the Choosing Wisely® campaign’s ability to initiate the meaningful de-adoption of low-value services. Specifically, prior efforts to reduce the use of services deemed to be of questionable benefit have met several challenges.2,6 Early analyses of the Choosing Wisely® recommendations reveal similar roadblocks and variable uptakes of several recommendations.7-10 While the reasons for difficulties in achieving de-adoption are broad, one important factor in whether clinicians are willing to follow guideline recommendations from such initiatives as Choosing Wisely®is the extent to which they believe in the underlying evidence.11 The current work seeks to formally evaluate the evidence supporting the Choosing Wisely® recommendations, and to compare the quality of evidence supporting SHM lists to other published Choosing Wisely® lists.
Using the online listing of published Choosing Wisely® recommendations, a dataset was generated incorporating all 320 recommendations comprising the 58 lists published through August, 2014; these include both the adult and pediatric hospital medicine lists released by the SHM.4,5,12 Although data collection ended at this point, this represents a majority of all 81 lists and 535 recommendations published through December, 2017. The reviewers (A.J.A., A.G., M.W., T.S.V., M.S., and C.R.C) extracted information about the references cited for each recommendation.
The reviewers obtained each reference cited by a Choosing Wisely® recommendation and categorized it by evidence strength along the following hierarchy: clinical practice guideline (CPG), primary research, review article, expert opinion, book, or others/unknown. CPGs were used as the highest level of evidence based on standard expectations for methodological rigor.13 Primary research was further rated as follows: systematic reviews and meta-analyses, randomized controlled trials (RCTs), observational studies, and case series. Each recommendation was graded using only the strongest piece of evidence cited.
We further sought to evaluate the strength of referenced CPGs. To accomplish this, a 10% random sample of the Choosing Wisely® recommendations citing CPGs was selected, and the referenced CPGs were obtained. Separately, CPGs referenced by the SHM-published adult and pediatric lists were also obtained. For both groups, one CPG was randomly selected when a recommendation cited more than one CPG. These guidelines were assessed using the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument, a widely used instrument designed to assess CPG quality.14,15 AGREE II consists of 25 questions categorized into 6 domains: scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence. Guidelines are also assigned an overall score. Two trained reviewers (A.J.A. and A.G.) assessed each of the sampled CPGs using a standardized form. Scores were then standardized using the method recommended by the instrument and reported as a percentage of available points. Although a standard interpretation of scores is not provided by the instrument, prior applications deemed scores below 50% as deficient16,17. When a recommendation item cited multiple CPGs, one was randomly selected. We also abstracted data on the year of publication, the evidence grade assigned to specific items recommended by Choosing Wisely®, and whether the CPG addressed the referring recommendation. All data management and analysis were conducted using Stata (V14.2, StataCorp, College Station, Texas).
A total of 320 recommendations were considered in our analysis, including 10 published across the 2 hospital medicine lists. When limited to the highest quality citation for each of the recommendations, 225 (70.3%) cited CPGs, whereas 71 (22.2%) cited primary research articles (Table 1). Specifically, 29 (9.1%) cited systematic reviews and meta-analyses, 28 (8.8%) cited observational studies, and 13 (4.1%) cited RCTs. One recommendation (0.3%) cited a case series as its highest level of evidence, 7 (2.2%) cited review articles, 7 (2.2%) cited editorials or opinion pieces, and 10 (3.1%) cited other types of documents, such as websites or books. Among hospital medicine recommendations, 9 (90%) referenced CPGs and 1 (10%) cited an observational study.
For the AGREE II assessment, we included 23 CPGs from the 225 referenced across all recommendations, after which we separately selected 6 CPGs from the hospital medicine recommendations. There was no overlap. Notably, 4 hospital medicine recommendations referenced a common CPG. Among the random sample of referenced CPGs, the median overall score obtained by using AGREE II was 54.2% (IQR 33.3%-70.8%, Table 2). This was similar to the median overall among hospital medicine guidelines (58.2%, IQR 50.0%-83.3%). Both hospital medicine and other sampled guidelines tended to score poorly in stakeholder involvement (48.6%, IQR 44.1%-61.1% and 47.2%, IQR 38.9%-61.1%, respectively). There were no significant differences between hospital medicine-referenced CPGs and the larger sample of CPGs in any AGREE II subdomains. The median age from the CPG publication to the list publication was 7 years (IQR 4–7) for hospital medicine recommendations and 3 years (IQR 2–6) for the nonhospital medicine recommendations. Substantial agreement was found between raters on the overall guideline assessment (ICC 0.80, 95% CI 0.58-0.91; Supplementary Table 1).
In terms of recommendation strengths and evidence grades, several recommendations were backed by Grades II–III (on a scale of I-III) evidence and level C (on a scale of A–C) recommendations in the reviewed CPG (Society of Maternal-Fetal Medicine, Recommendation 4, and Heart Rhythm Society, Recommendation 1). In one other case, the cited CPG did not directly address the Choosing Wisely® item (Society of Vascular Medicine, Recommendation 2).
Given the rising costs and the potential for iatrogenic harm, curbing ineffective practices has become an urgent concern. To achieve this, the Choosing Wisely® campaign has taken an important step by targeting certain low-value practices for de-adoption. However, the evidence supporting recommendations is variable. Specifically, 25 recommendations cited case series, review articles, or lower quality evidence as their highest level of support; moreover, among recommendations citing CPGs, quality, timeliness, and support for the recommendation item were variable. Although the hospital medicine lists tended to cite higher-quality evidence in the form of CPGs, these CPGs were often less recent than the guidelines referenced by other lists.
Our findings parallel those of other works that evaluate evidence among Choosing Wisely® recommendations and, more broadly, among CPGs.18–21 Lin and Yancey evaluated the quality of primary care-focused Choosing Wisely® recommendations using the Strength of Recommendation Taxonomy, a ranking system that evaluates evidence quality, consistency, and patient-centeredness.18 In their analysis, the authors found that many recommendations were based on lower quality evidence or relied on nonpatent-centered intermediate outcomes. Several groups, meanwhile, have evaluated the quality of evidence supporting CPG recommendations, finding them to be highly variable as well.19–21 These findings likely reflect inherent difficulties in the process, by which guideline development groups distill a broad evidence base into useful clinical recommendations, a reality that may have influenced the Choosing Wisely® list development groups seeking to make similar recommendations on low-value services.
These data should be taken in context due to several limitations. First, our sample of referenced CPGs includes only a small sample of all CPGs cited; thus, it may not be representative of all referenced guidelines. Second, the AGREE II assessment is inherently subjective, despite the availability of training materials. Third, data collection ended in April, 2014. Although this represents a majority of published lists to date, it is possible that more recent Choosing Wisely®lists include a stronger focus on evidence quality. Finally, references cited by Choosing Wisely®may not be representative of the entirety of the dataset that was considered when formulating the recommendations.
Despite these limitations, our findings suggest that Choosing Wisely®recommendations vary in terms of evidence strength. Although our results reveal that the majority of recommendations cite guidelines or high-quality original research, evidence gaps remain, with a small number citing low-quality evidence or low-quality CPGs as their highest form of support. Given the barriers to the successful de-implementation of low-value services, such campaigns as Choosing Wisely®face an uphill battle in their attempt to prompt behavior changes among providers and consumers.6-9 As a result, it is incumbent on funding agencies and medical journals to promote studies evaluating the harms and overall value of the care we deliver.
Although a majority of Choosing Wisely® recommendations cite high-quality evidence, some reference low-quality evidence or low-quality CPGs as their highest form of support. To overcome clinical inertia and other barriers to the successful de-implementation of low-value services, a clear rationale for the impetus to eradicate entrenched practices is critical.2,22 Choosing Wisely® has provided visionary leadership and a powerful platform to question low-value care. To expand the campaign’s efforts, the medical field must be able to generate the high-quality evidence necessary to support these efforts; further, list development groups must consider the availability of strong evidence when targeting services for de-implementation.
This work was supported, in part, by a grant from the Agency for Healthcare Research and Quality (No. K08HS020672, Dr. Cooke).
The authors have nothing to disclose.