Sepsis is the leading cause of in-hospital mortality in the United States.1 Sepsis is present on admission in 85% of cases, and each hour delay in antibiotic treatment is associated with 4% to 7% increased odds of mortality.2,3 Prompt identification and treatment of sepsis is essential for reducing morbidity and mortality, but identifying sepsis during triage is challenging.2
Risk stratification scores that rely solely on data readily available at the bedside have been developed to quickly identify those at greatest risk of poor outcomes from sepsis in real time. The quick Sequential Organ Failure Assessment (qSOFA) score, the National Early Warning System (NEWS2), and the Shock Index are easy-to-calculate measures that use routinely collected clinical data that are not subject to laboratory delay. These scores can be incorporated into electronic health record (EHR)-based alerts and can be calculated longitudinally to track the risk of poor outcomes over time. qSOFA was developed to quantify patient risk at bedside in non-intensive care unit (ICU) settings, but there is no consensus about its ability to predict adverse outcomes such as mortality and ICU admission.4-6 The United Kingdom’s National Health Service uses NEWS2 to identify patients at risk for sepsis.7 NEWS has been shown to have similar or better sensitivity in identifying poorer outcomes in sepsis patients compared with systemic inflammatory response syndrome (SIRS) criteria and qSOFA.4,8-11 However, since the latest update of NEWS2 in 2017, there has been little study of its predictive ability. The Shock Index is a simple bedside score (heart rate divided by systolic blood pressure) that was developed to detect changes in cardiovascular performance before systemic shock onset. Although it was not developed for infection and has not been regularly applied in the sepsis literature, the Shock Index might be useful for identifying patients at increased risk of poor outcomes. Patients with higher and sustained Shock Index scores are more likely to experience morbidity, such as hyperlactatemia, vasopressor use, and organ failure, and also have an increased risk of mortality.12-14
Although the predictive abilities of these bedside risk stratification scores have been assessed individually using standard binary cut-points, the comparative performance of qSOFA, the Shock Index, and NEWS2 has not been evaluated in patients presenting to an emergency department (ED) with suspected sepsis. Our objective was to provide a head-to-head comparison of the test characteristics of qSOFA, the Shock Index, and NEWS2 calculated at ED triage for predicting in-hospital mortality and ED-to-ICU admission in patients with suspected sepsis to help health systems and providers select screening measures.
Design and Setting
We conducted a retrospective cohort study of ED patients who presented with suspected sepsis to the University of California San Francisco (UCSF) Helen Diller Medical Center at Parnassus Heights between June 1, 2012, and December 31, 2018. Our institution is a 785-bed academic teaching hospital with approximately 30,000 ED encounters per year. The study was approved with a waiver of informed consent by the UCSF Human Research Protection Program.
We use an Epic-based EHR platform (Epic 2017, Epic Systems Corporation) for clinical care, which was implemented on June 1, 2012. All data elements were obtained from Clarity, the relational database that stores Epic’s inpatient data. The study included encounters for patients age ≥18 years who had blood cultures ordered within 24 hours of ED presentation and administration of intravenous antibiotics within 24 hours. Repeat encounters were treated independently in our analysis.
Outcomes and Measures
We compared the ability of qSOFA, the Shock Index, and NEWS2 to predict in-hospital mortality and admission to the ICU from the ED (ED-to-ICU admission). We used the most abnormal vital signs and clinical assessments gathered within 30 minutes of ED presentation to identify patients who were qSOFA-positive, Shock Index-positive, and NEWS2-positive based on standard cut-points of risk. Data elements used to calculate qSOFA, Shock Index, and NEWS2 included blood pressure, heart rate, respiratory rate, Glasgow Coma Scale (GCS) score, oxygen saturation, requirement for supplemental oxygen, and temperature (Table 1). Patients were considered positive if they had a qSOFAscore ≥2, Shock Index of >0.7, and NEWS2 ≥5 based on triage vital signs.7,15,16 We considered patients to have altered mental status, a criterion used for NEWS2, if they had a GCS score <15 instead of using the “alert, verbal, confusion, pain, unresponsive” scale, which is not captured in our EHR, a method that has been used in earlier studies.17,18 Missing assessments were considered normal. Although our primary analysis focused on the scores calculated within 30 minutes of ED presentation, we performed a sensitivity analysis examining scores calculated within 1 hour of ED presentation in the event of a delay in gathering triage vital sign data.
We compared demographic and clinical characteristics of patients who were positive for qSOFA, the Shock Index, and NEWS2. Demographic data were extracted from the EHR and included primary language, age, sex, and insurance status. All International Classification of Diseases (ICD)-9/10 diagnosis codes were pulled from Clarity billing tables. We used the Elixhauser comorbidity groupings19 of ICD-9/10 codes present on admission to identify preexisting comorbidities and underlying organ dysfunction. To estimate burden of comorbid illnesses, we calculated the validated van Walraven comorbidity index,20 which provides an estimated risk of in-hospital death based on documented Elixhauser comorbidities. Admission level of care (acute, stepdown, or intensive care) was collected for inpatient admissions to assess initial illness severity.21 We also evaluated discharge disposition and in-hospital mortality. Index blood culture results were collected, and dates and timestamps of mechanical ventilation, fluid, vasopressor, and antibiotic administration were obtained for the duration of the encounter.
UCSF uses an automated, real-time, algorithm-based severe sepsis alert that is triggered when a patient meets ≥2 SIRS criteria and again when the patient meets severe sepsis or septic shock criteria (ie, ≥2 SIRS criteria in addition to end-organ dysfunction and/or fluid nonresponsive hypotension). This sepsis screening alert was in use for the duration of our study.22
We performed a subgroup analysis among those who were diagnosed with sepsis, according to the 2016 Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) criteria. Sepsis is defined as a change in Sequential Organ Failure Assessment (SOFA) score of ≥2 points within the first 48 hours.23 Additionally, patients meeting Sepsis-3 criteria needed to (1) receive ≥4 days of sequential antibiotic therapy or experience death or discharge to hospice before 4 days of antibiotic therapy or (2) have a validated sepsis discharge billing code. These parameters were added to increase the specificity of our sample.24
All statistical analyses were conducted using Stata 14 (StataCorp). We summarized differences in demographic and clinical characteristics among the populations meeting each severity score but elected not to conduct hypothesis testing because patients could be positive for one or more scores. We calculated sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for each score to predict in-hospital mortality and ED-to-ICU admission. To allow comparison with other studies, we also created a composite outcome of either in-hospital mortality or ED-to-ICU admission. To assess score discrimination to predict in-hospital mortality and ED-to-ICU admission, we calculated the area under the receiver operating characteristic curve (AUROC) along with asymptotic normal 95% CI using the “roctab” command considering a binary cut-point, as well as the full range of scores measured in the cohort. The AUROC range from 0.50 to 1.00 and a score in the 0.70 to 0.80 range can be considered fair.25 We assessed significant differences between severity score AUROCs using the DeLong method26 implemented through Stata 14’s “roccomp” command. As a sensitivity analysis, we explored whether the standard cut-points for qSOFA, the Shock Index, and NEWS2 provided the highest AUROC in our population by calculating test characteristics for several score cut-points.
Within our sample 23,837 ED patients had blood cultures ordered within 24 hours of ED presentation and were considered to have suspected sepsis. The mean age of the cohort was 60.8 years, and 1,612 (6.8%) had positive blood cultures. A total of 12,928 patients (54.2%) were found to have sepsis. We documented 1,427 in-hospital deaths (6.0%) and 3,149 (13.2%) ED-to-ICU admissions. At ED triage 1,921 (8.1%) were qSOFA-positive, 4,273 (17.9%) were Shock Index-positive, and 11,832 (49.6%) were NEWS2-positive. At ED triage, blood pressure, heart rate, respiratory rate, and oxygen saturated were documented in >99% of patients, 93.5% had temperature documented, and 28.5% had GCS recorded. If the window of assessment was widened to 1 hour, GCS was only documented among 44.2% of those with suspected sepsis.
Demographic Characteristics and Clinical Course
We identified significant differences when comparing demographic and clinical characteristics among patients who scored positive for the three severity measures at triage (Table 2). Although no hypothesis testing was conducted because patients could meet one or more scores, qSOFA-positive patients were older (median 70, 66, and 64 years, respectively), more likely to have Medicare as the primary payor (67.6% vs 59.7% vs 56.6%), to have chronic renal failure (26.1%, 23.1%, and 23.3%, respectively), to have a greater degree of underlying comorbidities based on the van Walraven Comorbidity Index (median 15, 12, and 11, respectively), and to be admitted to the ICU from the ED (48.1%, 36.3%, and 21.0%, respectively) compared with those positive for the Shock Index or NEWS2.
qSOFA-positive patients received antibiotics more quickly than those who were Shock Index-positive or NEWS2-positive (median 1.5, 1.8, and 2.8 hours after admission, respectively). In addition, those who were qSOFA-positive were more likely to have a positive blood culture (10.9%, 9.4%, and 8.5%, respectively) and to receive an EHR-based diagnosis of sepsis (77.0%, 69.6%, and 60.9%, respectively) than those who were Shock Index- or NEWS2-positive. Those who were qSOFA-positive also were more likely to be mechanically ventilated during their hospital stay (25.4%, 19.2%, and 10.8%, respectively) and to receive vasopressors (33.5%, 22.5%, and 12.2%, respectively). In-hospital mortality also was more common among those who were qSOFA-positive at triage (23.4%, 15.3%, and 9.2%, respectively).
Because both qSOFA and NEWS2 incorporate GCS, we explored baseline characteristics of patients with GCS documented at triage (n = 6,794). These patients were older (median age 63 and 61 years, P < .0001), more likely to be male (54.9% and 53.4%, P = .0031), more likely to have renal failure (22.8% and 20.1%, P < .0001), more likely to have liver disease (14.2% and 12.8%, P = .006), had a higher van Walraven comorbidity score on presentation (median 10 and 8, P < .0001), and were more likely to go directly to the ICU from the ED (20.2% and 10.6%, P < .0001). However, among the 6,397 GCS scores documented at triage, only 1,579 (24.7%) were abnormal.
Test Characteristics of qSOFA, Shock Index, and NEWS2 for Predicting In-hospital Mortality and ED-to-ICU Admission
Among 23,837 patients with suspected sepsis, NEWS2 had the highest sensitivity for predicting in-hospital mortality (76.0%; 95% CI, 73.7%-78.2%) and ED-to-ICU admission (78.9%; 95% CI, 77.5%-80.4%) but had the lowest specificity for in-hospital mortality (52.0%; 95% CI, 51.4%-52.7%) and for ED-to-ICU admission (54.8%; 95% CI, 54.1%-55.5%) (Table 3). qSOFA had the lowest sensitivity for in-hospital mortality (31.5%; 95% CI, 29.1%-33.9%) and ED-to-ICU admission (29.3%; 95% CI, 27.7%-30.9%) but the highest specificity for in-hospital mortality (93.4%; 95% CI, 93.1%-93.8%) and ED-to-ICU admission (95.2%; 95% CI, 94.9%-95.5%). The Shock Index had a sensitivity that fell between qSOFA and NEWS2 for in-hospital mortality (45.8%; 95% CI, 43.2%-48.5%) and ED-to-ICU admission (49.2%; 95% CI, 47.5%-51.0%). The specificity of the Shock Index also was between qSOFA and NEWS2 for in-hospital mortality (83.9%; 95% CI, 83.4%-84.3%) and ED-to-ICU admission (86.8%; 95% CI, 86.4%-87.3%). All three scores exhibited relatively low PPV, ranging from 9.2% to 23.4% for in-hospital mortality and 21.0% to 48.0% for ED-to-ICU triage. Conversely, all three scores exhibited relatively high NPV, ranging from 95.5% to 97.1% for in-hospital mortality and 89.8% to 94.5% for ED-to-ICU triage. The patterns in sensitivity and specificity for in-hospital mortality and ED-to-ICU admission were similar among the 12,928 patients who received an EHR-based sepsis diagnosis with the tests generally demonstrating lower specificities, higher PPVs, and lower NPVs (Table 3).
When considering a binary cutoff, the Shock Index exhibited the highest AUROC for in-hospital mortality (0.648; 95% CI, 0.635-0.662) and had a significantly higher AUROC than qSOFA (AUROC, 0.625; 95% CI, 0.612-0.637; P = .0005), but there was no difference compared with NEWS2 (AUROC, 0.640; 95% CI, 0.628-0.652; P = .2112). NEWS2 had a significantly higher AUROC than qSOFA for predicting in-hospital mortality (P = .0227). The Shock Index also exhibited the highest AUROC for ED-to-ICU admission (0.680; 95% CI, 0.617-0.689), which was significantly higher than the AUROC for qSOFA (P < .0001) and NEWS2 (P = 0.0151). NEWS2 had a significantly higher AUROC than qSOFA for predicting ED-to-ICU admission (P < .0001). Similar findings were seen in patients found to have sepsis. When considering the range of possible scores measured in our cohort, qSOFA and NEWS2 exhibited higher AUROCs for in-hospital mortality and ED-to-ICU admission than the Shock Index among patients with suspected infection and the subgroup with a sepsis diagnosis (Figure).Appendix). For our institution, the qSOFA cut-point with the highest AUROC would be qSOFA > 0 for both in-hospital mortality (AUROC, 0.699; 95% CI, 0.687-0.711) and ED-to-ICU admission (AUROC, 0.716; 95% CI, 0.707-0.724), with 36.5% of the cohort meeting qSOFA. The NEWS2 cut-point with the highest AUROC would be NEWS2 ≥7 for both in-hospital mortality (AUROC, 0.653; 95% CI, 0.640-0.666) and ED-to-ICU admission (AUROC, 0.677; 95% CI, 0.668-0.686), with 20.3% of the cohort meeting NEWS2 at this cut-point. The standard Shock Index cut-point ≥0.7 exhibited the highest AUROC for in-hospital mortality and ED-to-ICU admission at our institution.
In this retrospective cohort study of 23,837 patients who presented to the ED with suspected sepsis, the standard qSOFA threshold was met least frequently, followed by the Shock Index and NEWS2. NEWS2 had the highest sensitivity but the lowest specificity for predicting in-hospital mortality and ED-to-ICU admission, making it a challenging bedside risk stratification scale for identifying patients at risk of poor clinical outcomes. When comparing predictive performance among the three scales, qSOFA had the highest specificity and the Shock Index had the highest AUROC for in-hospital mortality and ED-to-ICU admission in this cohort of patients with suspected sepsis. These trends in sensitivity, specificity, and AUROC were consistent among those who met EHR criteria for a sepsis diagnosis. In the analysis of the three scoring systems using all available cut-points, qSOFA and NEWS2 had the highest AUROCs, followed by the Shock Index.
Considering the rapid progression from organ dysfunction to death in sepsis patients, as well as the difficulty establishing a sepsis diagnosis at triage,23 providers must quickly identify patients at increased risk of poor outcomes when they present to the ED. Sepsis alerts often are built using SIRS criteria,27 including the one used for sepsis surveillance at UCSF since 2012,22 but the white blood cell count criterion is subject to a laboratory lag and could lead to a delay in identification. Implementation of a point-of-care bedside score alert that uses readily available clinical data could allow providers to identify patients at greatest risk of poor outcomes immediately at ED presentation and triage, which motivated us to explore the predictive performance of qSOFA, the Shock Index, and NEWS2.
Our study is the first to provide a head-to-head comparison of the predictive performance of qSOFA, the Shock Index, and NEWS2, three easy-to-calculate bedside risk scores that use EHR data collected among patients with suspected sepsis. The Sepsis-3 guidelines recommend qSOFA to quickly identify non-ICU patients at greatest risk of poor outcomes because the measure exhibited predictive performance similar to the more extensive SOFA score outside the ICU.16,23 Although some studies have confirmed qSOFA’s high predictive performance,28-31 our test characteristics and AUROC findings are in line with other published analyses.4,6,10,17 The UK National Health Service is using NEWS2 to screen for patients at risk of poor outcomes from sepsis. Several analyses that assessed the predictive ability of NEWS have reported estimates in line with our findings.4,10,32 The Shock Index was introduced in 1967 and provided a metric to evaluate hemodynamic stability based on heart rate and systolic blood pressure.33 The Shock Index has been studied in several contexts, including sepsis,34 and studies show that a sustained Shock Index is associated with increased odds of vasopressor administration, higher prevalence of hyperlactatemia, and increased risk of poor outcomes in the ICU.13,14
For our study, we were particularly interested in exploring how the Shock Index would compare with more frequently used severity scores such as qSOFA and NEWS2 among patients with suspected sepsis, given the simplicity of its calculation and the easy availability of required data. In our cohort of 23,837 patients, only 159 people had missing blood pressure and only 71 had omitted heart rate. In contrast, both qSOFA and NEWS2 include an assessment of level of consciousness that can be subject to variability in assessment methods and EHR documentation across institutions.11 In our cohort, GCS within 30 minutes of ED presentation was missing in 72 patients, which could have led to incomplete calculation of qSOFA and NEWS2 if a missing value was not actually within normal limits.
Several investigations relate qSOFA to NEWS but few compare qSOFA with the newer NEWS2, and even fewer evaluate the Shock Index with any of these scores.10,11,18,29,35-37 In general, studies have shown that NEWS exhibits a higher AUROC for predicting mortality, sepsis with organ dysfunction, and ICU admission, often as a composite outcome.4,11,18,37,38 A handful of studies compare the Shock Index to SIRS; however, little has been done to compare the Shock Index to qSOFA or NEWS2, scores that have been used specifically for sepsis and might be more predictive of poor outcomes than SIRS.33 In our study, the Shock Index had a higher AUROC than either qSOFA or NEWS2 for predicting in-hospital mortality and ED-to-ICU admission measured as separate outcomes and as a composite outcome using standard cut-points for these scores.
When selecting a severity score to apply in an institution, it is important to carefully evaluate the score’s test characteristics, in addition to considering the availability of reliable data. Tests with high sensitivity and NPV for the population being studied can be useful to rule out disease or risk of poor outcome, while tests with high specificity and PPV can be useful to rule in disease or risk of poor outcome.39 When considering specificity, qSOFA’s performance was superior to the Shock Index and NEWS2 in our study, but a small percentage of the population was identified using a cut-point of qSOFA ≥2. If we used qSOFA and applied this standard cut-point at our institution, we could be confident that those identified were at increased risk, but we would miss a significant number of patients who would experience a poor outcome. When considering sensitivity, performance of NEWS2 was superior to qSOFA and the Shock Index in our study, but one-half of the population was identified using a cut-point of NEWS2 ≥5. If we were to apply this standard NEWS2 cut-point at our institution, we would assume that one-half of our population was at risk, which might drive resource use towards patients who will not experience a poor outcome. Although none of the scores exhibited a robust AUROC measure, the Shock Index had the highest AUROC for in-hospital mortality and ED-to-ICU admission when using the standard binary cut-point, and its sensitivity and specificity is between that of qSOFA and NEWS2, potentially making it a score to use in settings where qSOFA and NEWS2 score components, such as altered mentation, are not reliably collected. Finally, our sensitivity analysis varying the binary cut-point of each score within our population demonstrated that the standard cut-points might not be as useful within a specific population and might need to be tailored for implementation, balancing sensitivity, specificity, PPV, and NPV to meet local priorities and ICU capacity.
Our study has limitations. It is a single-center, retrospective analysis, factors that could reduce generalizability. However, it does include a large and diverse patient population spanning several years. Missing GCS data could have affected the predictive ability of qSOFA and NEWS2 in our cohort. We could not reliably perform imputation of GCS because of the high missingness and therefore we assumed missing was normal, as was done in the Sepsis-3 derivation studies.16 Previous studies have attempted to impute GCS and have not observed improved performance of qSOFA to predict mortality.40 Because manually collected variables such as GCS are less reliably documented in the EHR, there might be limitations in their use for triage risk scores.
Although the current analysis focused on the predictive performance of qSOFA, the Shock Index, and NEWS2 at triage, performance of these scores could affect the ED team’s treatment decisions before handoff to the hospitalist team and the expected level of care the patient will receive after in-patient admission. These tests also have the advantage of being easy to calculate at the bedside over time, which could provide an objective assessment of longitudinal predicted prognosis. Future work should assess the longitudinal performance of each of these scores among those with suspected sepsis and to determine the impact using these scores would have on clinical and resource utilization outcomes.
Local priorities should drive selection of a screening tool, balancing sensitivity, specificity, PPV, and NPV to achieve the institution’s goals. qSOFA, Shock Index, and NEWS2 are risk stratification tools that can be easily implemented at ED triage using data available at the bedside. Although none of these scores performed strongly when comparing AUROCs, qSOFA was highly specific for identifying patients with poor outcomes, and NEWS2 was the most sensitive for ruling out those at high risk among patients with suspected sepsis. The Shock Index exhibited a sensitivity and specificity that fell between qSOFA and NEWS2 and also might be considered to identify those at increased risk, given its ease of implementation, particularly in settings where altered mentation is unreliably or inconsistently documented.
The authors thank the UCSF Division of Hospital Medicine Data Core for their assistance with data acquisition.