Amid the continued shift from fee-for-service toward value-based payment, policymakers such as the Centers for Medicare & Medicaid Services have initiated strategies to contain spending on episodes of care. This episode focus has led to nationwide implementation of payment models such as bundled payments, which hold hospitals accountable for quality and costs across procedure-based (eg, coronary artery bypass surgery) and condition-based (eg, congestive heart failure) episodes, which begin with hospitalization and encompass subsequent hospital and postdischarge care.
Simultaneously, Medicare has increased its emphasis on similarly designed episodes of care (eg, those spanning hospitalization and postdischarge care) using other strategies, such as public reporting and use of episode-based measures to evaluate hospital cost performance. In 2017, Medicare trialed the implementation of six Clinical Episode-Based Payment (CEBP) measures in the national Hospital Inpatient Quality Reporting Program in order to assess hospital and clinician spending on procedure and condition episodes.1,2
CEBP measures reflect episode-specific spending, conveying “how expensive a hospital is” by capturing facility and professional payments for a given episode spanning between 3 days prior to hospitalization and 30 days following discharge. Given standard payment rates used in Medicare, the variation in episode spending reflects differences in quantity and type of services utilized within an episode. Medicare has specified episode-related services and designed CEBP measures via logic and definition rules informed by a combination of claims and procedures-based grouping, as well as by physician input. For example, the CEBP measure for cellulitis encompasses services related to diagnosing and treating the infection within the episode window, but not unrelated services such as eye exams for coexisting glaucoma. To increase clinical salience, CEBP measures are subdivided to reflect differing complexity when possible. For instance, cellulitis measures are divided into episodes with or without major complications or comorbidities and further subdivided into subtypes for episodes reflecting cellulitis in patients with diabetes, patients with decubitus ulcers, or neither.
CEBPs are similar to other spending measures used in payment programs, such as the Medicare Spending Per Beneficiary, but are more clinically relevant because their focus on episodes more closely reflects clinical practice. CEBPs and Medicare Spending Per Beneficiary have similar designs (eg, same episode windows) and purpose (eg, to capture the cost efficiency of hospital care).3 However, unlike CEBPs, Medicare Spending Per Beneficiary is a “global” measure that summarizes a hospital’s cost efficiency aggregated across all inpatient episodes rather than represent it based on specific conditions or procedures.4 The limitations of publicly reported global hospital measures—for instance, the poor correlation between hospital performance on distinct publicly reported quality measures5—highlight the potential utility of episode-specific spending measures such as CEBP.
Compared with episode-based payment models, initiatives such as CEBP measures have gone largely unstudied. However, they represent signals of Medicare’s growing commitment to addressing care episodes, tested without potentially tedious rulemaking required to change payment. In fact, publicly reported episode spending measures offer policymakers several interrelated benefits: the ability to rapidly evaluate performance at a large number of hospitals (eg, Medicare scaling up CEBP measures among all eligible hospitals nationwide), the option of leveraging publicly reported feedback to prompt clinical improvements (eg, by including CEBP measures in the Hospital Inpatient Quality Reporting Program), and the platform for developing and testing promising spending measures for subsequent use in formal payment models (eg, by using CEBP measures that possess large variation or cost-reduction opportunities in future bundled payment programs).
Despite these benefits, little is known about hospital performance on publicly reported episode-specific spending measures. We addressed this knowledge gap by providing what is, to our knowledge, the first nationwide description of hospital performance on such measures. We also evaluated which episode components accounted for spending variation in procedural vs condition episodes, examined whether CEBP measures can be used to effectively identify high- vs low-cost hospitals, and compared spending performance on CEBPs vs Medicare Spending Per Beneficiary.
Data and Study Sample
We utilized publicly available data from Hospital Compare, which include information about hospital-level CEBP and Medicare Spending Per Beneficiary performance for Medicare-certified acute care hospitals nationwide.5 Our analysis evaluated the six CEBP measures tested by Medicare in 2017: three conditions (cellulitis, kidney/urinary tract infection [UTI], gastrointestinal hemorrhage) and three procedures (spinal fusion, cholecystectomy and common duct exploration, and aortic aneurysm repair). Per Medicare rules, CEBP measures are calculated only for hospitals with requisite volume for targeted conditions (minimum of 40 episodes) and procedures (minimum of 25 episodes) and are reported on Hospital Compare in risk-adjusted (eg, for age, hierarchical condition categories in alignment with existing Medicare methodology) and payment-standardized form (ie, accounts for wage index, medical education, disproportionate share hospital payments) . Each CEBP encompasses episodes with or without major complications/comorbidities.
For each hospital, CEBP spending is reported as average total episode spending, as well as average spending on specific components. We grouped components into three groups: hospitalization, skilled nursing facility (SNF) use, and other (encompassing postdischarge readmissions, emergency department visits, and home health agency use), with a focus on SNF given existing evidence from episode-based payment models about the opportunity for savings from reduced SNF care. Hospital Compare also provides information about the national CEBP measure performance (ie, average spending for a given episode type among all eligible hospitals nationwide).
To evaluate hospitals’ CEBP performance for specific episode types, we categorized hospitals as either “below average spending” if their average episode spending was below the national average or “above average spending” if spending was above the national average. According to this approach, a hospital could have below average spending for some episodes but above average spending for others.
To compare hospitals across episode types simultaneously, we categorized hospitals as “low cost” if episode spending was below the national average for all applicable measures, “high cost” if episode spending was above the national average for all applicable measures, or “mixed cost” if episode spending was above the national average for some measures and below for others.
We also conducted sensitivity analyses using alternative hospital group definitions. For comparisons of specific episode types, we categorized hospitals as “high spending” (top quartile of average episode spending among eligible hospitals) or “other spending” (all others). For comparisons across all episode types, we focused on SNF care and categorized hospitals as “high SNF cost” (top quartile of episode spending attributed to SNF care) and “other SNF cost” (all others). We applied a similar approach to Medicare Spending Per Beneficiary, categorizing hospitals as either “low MSPB cost” if their episode spending was below the national average for Medicare Spending Per Beneficiary or “high MSPB cost” if not.
We assessed variation by describing the distribution of total episode spending across eligible hospitals for each individual episode type, as well as the proportion of spending attributed to SNF care across all episode types. We reported the difference between the 10th and 90th percentile for each distribution to quantify variation. To evaluate how individual episode components contributed to overall spending variation, we used linear regression and applied analysis of variance to each episode component. Specifically, we regressed episode spending on each episode component (hospital, SNF, other) separately and used these results to generate predicted episode spending values for each hospital based on its value for each spending component. We then calculated the differen-ces (ie, residuals) between predicted and actual total episode spending values. We plotted residuals for each component, with lower residual plot variation (ie, a flatter curve) representing larger contribution of a spending component to overall spending variation.
Pearson correlation coefficients were used to assess within-hospital CEBP correlation (ie, the extent to which performance was hospital specific). We evaluated if and how components of spending varied across hospitals by comparing spending groups (for individual episode types) and cost groups (for all episode types). To test the robustness of these categories, we conducted sensitivity analyses using high spending vs other spending groups (for individual episode types) and high SNF cost vs low SNF cost groups (for all episode types).
To assess concordance between CEBP and Medicare Spending Per Beneficiary, we cross tabulated hospital CEBP performance (high vs low vs mixed cost) and Medicare Spending Per Beneficiary performance (high vs low MSPB cost). This approached allowed us to quantify the number of hospitals that have concordant performance for both types of spending measures (ie, high cost or low cost on both) and the number with discordant performance (eg, high cost on one spending measure but low cost on the other). We used Pearson correlation coefficients to assess correlation between CEBP and Medicare Spending Per Beneficiary, with evaluation of CEBP performance in aggregate form (ie, hospitals’ average CEBP performance across all eligible episode types) and by individual episode types.
Chi-square and Kruskal-Wallis tests were used to compare categorical and continuous variables, respectively. To compare spending amounts, we evaluated the distribution of total episode spending (Appendix Figure 1) and used ordinary least squares regression with spending as the dependent variable and hospital group, episode components, and their interaction as independent variables. Because CEBP dollar amounts are reported through Hospital Compare on a risk-adjusted and payment-standardized basis, no additional adjustments were applied. Analyses were performed using SAS version 9.4 (SAS Institute; Cary, NC) and all tests of significance were two-tailed at alpha=0.05.
Of 3,129 hospitals, 1,778 achieved minimum thresholds and had CEBPs calculated for at least one of the six CEBP episode types.
Variation in CEBP Performance
For each episode type, spending varied across eligible hospitals (Appendix Figure 2). In particular, the difference between the 10th and 90th percentile values for cellulitis, kidney/UTI, and gastrointestinal hemorrhage were $2,873, $3,514, and $2,982, respectively. Differences were greater for procedural episodes of aortic aneurysm ($17,860), spinal fusion ($11,893), and cholecystectomy ($3,689). Evaluated across all episode types, the proportion of episode spending attributed to SNF care also varied across hospitals (Appendix Figure 3), with a difference of 24.7% between the 10th (4.5%) and 90th (29.2%) percentile values.
Residual plots demonstrated differences in which episode components accounted for variation in overall spending. For aortic aneurysm episodes, variation in the SNF episode component best explained variation in episode spending and thus had the lowest residual plot variation, followed by other and hospital components (Figure). Similar patterns were observed for spinal fusion and cholecystectomy episodes. In contrast, for cellulitis episodes, all three components had comparable residual-plot variation, which indicates that the variation in the components explained episode spending variation similarly (Figure)—a pattern reflected in kidney/UTI and gastrointestinal hemorrhage episodes.
Correlation in Performance on CEBP Measures
Across hospitals in our sample, within-hospital correlations were generally low (Appendix Table 1). In particular, correlations ranged from –0.079 (between performance on aortic aneurysm and kidney/UTI episodes) to 0.42 (between performance on kidney/UTI and cellulitis episodes), with a median correlation coefficient of 0.13. Within-hospital correlations ranged from 0.037 to 0.28 when considered between procedural episodes and from 0.33 to 0.42 when considered between condition episodes. When assessed among the subset of 1,294 hospitals eligible for at least two CEBP measures, correlations were very similar (ranging from –0.080 to 0.42). Additional analyses among hospitals with more CEBPs (eg, all six measures) yielded correlations that were similar in magnitude.
CEBP Performance by Hospital Groups
Overall spending on specific episode types varied across hospital groups (Table). Spending for aortic aneurysm episodes was $42,633 at hospitals with above average spending and $37,730 at those with below average spending, while spending for spinal fusion episodes was $39,231 at those with above average spending and $34,832 at those with below average spending. In comparison, spending at hospitals deemed above and below average spending for cellulitis episodes was $10,763 and $9,064, respectively, and $11,223 and $9,161 at hospitals deemed above and below average spending for kidney/UTI episodes, respectively.
Spending on specific episode components also differed by hospital group (Table). Though the magnitude of absolute spending amounts and differences varied by specific episode, hospitals with above average spending tended to spend more on SNF than did those with below average spending. For example, hospitals with above average spending for cellulitis episodes spent an average of $2,564 on SNF (24% of overall episode spending) vs $1,293 (14% of episode spending) among those with below average spending. Similarly, hospitals with above and below average spending for kidney/UTI episodes spent $4,068 (36% of episode spending) and $2,232 (24% of episode spending) on SNF, respectively (P < .001 for both episode types). Findings were qualitatively similar in sensitivity analyses (Appendix Table 3).
Among hospitals in our sample, we categorized 481 as high cost (27%), 452 as low cost (25%), and 845 as mixed cost (48%), with hospital groups distributed broadly nationwide (Appendix Figure 4). Evaluated on performance across all six episode types, hospital groups also demonstrated differences in spending by cost components (Table). In particular, spending in SNF ranged from 18.1% of overall episode spending among high-cost hospitals to 10.7% among mixed-cost hospitals and 9.2% among low-cost hospitals. Additionally, spending on hospitalization accounted for 83.3% of overall episode spending among low-cost hospitals, compared with 81.2% and 73.4% among mixed-cost and high-cost hospitals, respectively (P < .001). Comparisons were qualitatively similar in sensitivity analyses (Appendix Table 2).
Comparison of CEBP and Medicare Spending Per Beneficiary Performance
Correlation between Medicare Spending Per Beneficiary and aggregated CEBPs was 0.42 and, for individual episode types, ranged between 0.14 and 0.36 (Appendix Table 3). There was low concordance between hospital performance on CEBP and Medicare Spending Per Beneficiary. Across all eligible hospitals, only 16.3% (290/1778) had positive concordance between performance on the two measure types (ie, low cost for both), while 16.5% (293/1778) had negative concordance (ie, high cost for both). There was discordant performance in most instances (67.2%; 1195/1778), which reflecting favorable performance on one measure type but not the other.
To our knowledge, this study is the first to describe hospitals’ episode-specific spending performance nationwide. It demonstrated significant variation across hospitals driven by different episode components for different episode types. It also showed low correlation between individual episode spending measures and poor concordance between episode-specific and global hospital spending measures. Two practice and policy implications are noteworthy.
First, our findings corroborate and build upon evidence from bundled payment programs about the opportunity for hospitals to improve their cost efficiency. Findings from bundled payment evaluations of surgical episodes suggest that the major area for cost savings is in the reduction of institutional post-acute care use such as that of SNFs.7-9 We demonstrated similar opportunity in a national sample of hospitals, finding that, for the three evaluated procedural CEBPs, SNF care accounted for more variation in overall episode spending than did other components. While variation may imply opportunity for greater efficiency and standardization, it is important to note that variation itself is not inherently problematic. Additional studies are needed to distinguish between warranted and unwarranted variation in procedural episodes, as well as identify strategies for reducing the latter.
Though bundled payment evaluations have predominantly emphasized procedural episodes, existing evidence suggests that participation in medical condition bundles has not been associated with cost savings or utilization changes.7-15 Findings from our analysis of variance—that there appear to be smaller variation-reduction opportunities for condition episodes than for procedural episodes—offer insight into this issue. Existing episodes are initiated by hospitalization and extend into the postacute period, a design that may not afford substantial post-acute care savings opportunities for condition episodes. This is an important insight as policymakers consider how to best design condition-based episodes in the future (eg, whether to use non–hospital based episode triggers). Future work should evaluate whether our findings reflect inherent differences between condition and procedural episodes16 or whether interventions can still optimize SNF care for these episodes despite smaller variation.
Second, our results highlight the potential limitations of global performance measures such as Medicare Spending Per Beneficiary. As a general measure of hospital spending, Medicare Spending Per Beneficiary is based on the premise that hospitals can be categorized as high or low cost with consideration of all inpatient episodic care. However, our analyses suggest that hospitals may be high cost for certain episodes and low cost for others—a fact highlighted by the low correlation and high discordance observed between hospital CEBP and Medicare Spending Per Beneficiary performance. Because overarching measures may miss spending differen-ces related to underlying clinical scenarios, episode-specific spending measures would provide important perspective and complements to global measures for assessing hospital cost performance, particularly in an era of value-based payments. Policymakers should consider prioritizing the development and implementation of such measures.
Our study has limitations. First, it is descriptive in nature, and future work should evaluate the association between episode-specific spending measure performance and clinical and quality outcomes. Second, we evaluated all CEBP-eligible hospitals nationwide to provide a broad view of episode-specific spending. However, future studies should assess performance among hospital subtypes, such as vertically integrated or safety-net organizations, because they may be more or less able to perform on these spending measures. Third, though findings may not be generalizable to other clinical episodes, our results were qualitatively consistent across episode types and broadly consistent with evidence from episode-based payment models. Fourth, we analyzed cost from the perspective of utilization and did not incorporate price considerations, which may be more relevant for commercial insurers than it is for Medicare.
Nonetheless, the emergence of CEBPs reflects the ongoing shift in policymaker attention toward episode-specific spending. In particular, though further scale or use of CEBP measures has been put on hold amid other payment reform changes, their nationwide implementation in 2017 signals Medicare’s broad interest in evaluating all hospitals on episode-specific spending efficiency, in addition to other facets of spending, quality, safety, and patient experience. Importantly, such efforts complement other ongoing nationwide initiatives for emphasizing episode spending, such as use of episode-based cost measures within the Merit-Based Incentive Payment System17 to score clinicians and groups in part based on their episode-specific spending efficiency. Insight about episode spending performance could help hospitals prepare for environments with increasing focus on episode spending and as policymakers incorporate this perspective into quality and value-based payment policies.
Dr. Liao reports textbook royalties from Wolters Kluwer and personal fees from Kaiser Permanente Washington Research Institute, none of which are related to this manuscript. Dr. Zhou has nothing to disclose. Dr. Navathe reported receiving grants from Hawaii Medical Service Association, Anthem Public Policy Institute, Healthcare Research and Education Trust, Cigna, and Oscar Health; personal fees from Navvis Healthcare, and Agathos, Inc.; personal fees and equity from NavaHealth; equity from Embedded Healthcare; speaking fees from the Cleveland Clinic; personal fees from the Medicare Payment Advisory Commission; and an honorarium from Elsevier Press, as well as serving as a board member of Integrated Services Inc. without compensation, none of which are related to this manuscript.