Go back to the U.S. Health Weather Map.

Real-time detection of city illness anomalies using the Kinsa thermometer network and highly accurate long-lead forecasts: Technical Approach

Rapidly identifying emerging epidemics remains a massive challenge that limits our ability to effectively curtail outbreaks, such as COVID-19. We have developed a method to identify anomalous influenza-like illness incidence (ILI) outbreaks in real-time using Kinsa’s county-level illness signals, developed from real-time geospatial thermometer data, and highly accurate 12-week illness forecasts (See Miller et al. (2018 Clin. Infect. Dis.). Here, we flag anomalously high incidence data by comparing real-time ILI to expected seasonal influenza trends, where these expectations are generated from geo-specific influenza forecasts made from a point prior to potential outbreaks.

We generate a range of expected influenza trends per county by using a highly accurate, long-lead influenza forecast trained on our county-level ILI data. This forecasting approach leverages geo-specific data to estimate daily, seasonal transmissivity patterns of influenza per city, allowing us to learn the ‘fingerprint’ of each region’s past influenza outbreaks. This method builds upon the findings outlined in Dalziel et al. (2018 Science), where the authors demonstrate individual cities have unique epidemic intensity curves driven by climate and population structure, and we use these patterns to build highly accurate long-lead illness forecasts at the county-scale. For example, small cities are known to have sharp epidemics, whereas larger cities tend to have ‘flattened’ incidence curves due to herd immunity effects. We capture these trends in our granular data and use this information to build regional forecasts.

We can estimate the daily reproductive number ($R$) directly from our data. This term represents the reproductive rate of a virus (an $R$ of 2 means for each infection it is passed to 2 other individuals), and these values vary seasonally and from place to place and virus to virus. Our forecasts leverage multiple years of county-specific incidence data to calculate daily reproductive number ($R$) estimates that are unique to each city using the following equation:

$$I_{t+1} = R_t\sum_{k}w_kI_{t-k}$$

where $w$ is the probability at which flu spreads to other individuals from 1 to 5 days, $I$ is Kinsa county-level illness incidence, $R$ is the reproductive number, and $\sum_{k}w_kI_{t-1}$ is a term called effective incidence. We estimate $w$ based on findings on the rates of flu spread from literature (Carrat et al. 2018).

Using equation 1, we can then estimate an influenza transmission fingerprint per locale that we then use to predict future influenza incidence by forward propagating $I$, again using equation 1. For this expected influenza prediction, we substitute our daily estimates of $R$ for all future dates ($t$) to calculate $I_{t+1}$. 

We account for measurement uncertainty in Kinsa incidence and influenza forecasts by running an ensemble of predictions where we add random noise to the starting values of $I$ at the point of influenza forecasting. This noise is randomly drawn from a Gaussian distribution, where the scale of the noise is unique per region and determined directly from each region’s ILI time-series. The noise we observe in the Kinsa incidence signal is normally distributed and decreases with sensor penetration per region.

Forecast Horizon (weeks)

Figure 1: Model comparisons between Kinsa’s 12-week and Carnegie Mellon’s (CMU) illness forecasts. Kinsa’s forecast errors are on par with the best-in-class models with reduced variation at 1-2 weeks. Kinsa forecast error rates stabilize above 5 weeks out.

Our forecasts are on par with the academic best-in-class models at one to two weeks out, and are shown to be highly stable, with consistent error levels, out to 12 weeks (Figure 1). These city-specific models are used to estimate expected influenza trends by making predictions prior to an expected outbreak, such as COVID-19. We compare our real-time signal to these expectations in order to identify illness trends that are not likely due to normal seasonal influenza. Here, we compare our current illness levels to the ensemble predictions of expected influenza to estimate the likelihood that current incidence is due to seasonal influenza dynamics. Any real-time value above the upper 95% confidence interval of seasonal influenza is flagged as anomalous. 

Identifying Early Flu B: Fall 2019
We can test this method by applying it to past anomalous events that occurred to see whether our anomaly detection method flags the event. A great test for this is the abnormal Influenza B outbreak that occurred in Fall 2019. Influenza B generally comes after the Influenza A strain seasonally. This was an abnormal outbreak that hit the American South hard and early, leading to the earliest flu season since 2009.

Applying the method, we observe multiple illness anomalies in Houston, TX where the early circulating Influenza B strain caused known early season outbreaks (Figure 2). As expected, we do not observe such events in Brooklyn, NY, given the early Influenza B outbreak originated in the American South (Figure 3). This example demonstrates how we can successfully identify anomalous outbreaks with this method. It is also important to note that this method identifies anomalous ILI events, not COVID-19 in particular. Once an anomalous event is identified we should always then use other data sources to triangulate the reasons for the anomalous outbreak.

Houston, TX - Flu B Anomaly Detected

Figure 2: Illness anomaly detection for Houston, TX during the fall Influenza B outbreak. Our method correctly identifies anomalous outbreaks likely due to Influenza B circulation. Here, the blue line represents median expected influenza forecast and the shaded range represents 95% confidence intervals.

Brooklyn, NY - No Flu B Anomaly Detected

Figure 3: Illness anomaly detection for Brooklyn, NY where the same early season Influenza B outbreak was not observed.

Application to COVID-19 Detection
This method is currently being applied to aid in the early detection of potential COVID-19 outbreaks. Here, we forecast expected flu-like illness trends for every county in the continental United States from March 1, before widespread COVID-19 infections were observed, and compare our real-time data to these expectations. In Brooklyn, NY, we begin to see anomalous events into the second week of March (Figure 4). This provides us with guidance of where potential COVID-19 outbreaks may be occurring. This method holds promise for real-time illness anomaly detection efforts used to identify emerging pandemics and severe flu outbreaks.

Brooklyn, NY - Late Season Detection

Figure 4: Example plot for how this method would be used to identify potential COVID-19 outbreaks in real-time. Expected influenza forecasts are generated from a point prior to outbreak, and anomalies falling outside of the 95% confidence limit are identified each day.

For answers to common questions, please review our FAQ page.

Carrat et al. 2018. Time lines of infection and disease in human influenza: A review of volunteer challenge studies. American Journal of Epidemiology. 167: 775-785. https://doi.org/10.1093/aje/kwm375

Dalziel et al. 2018. Urbanization and humidity shape the intensity of influenza epidemics in US cities. Science. 362: 75-79. https://doi.org/10.1126/science.aat6030

Miller AC, Singh I, Koehler E, Polgreen PM. A Smartphone-Driven Thermometer Application for Real-time Population- and Individual-Level Influenza Surveillance. Clin Infect Dis. 2018;67:388–97. https://doi.org/10.1093/cid/ciy073