Journalists' Guide to COVID Data

Education Resources,

map with dots as data

Watch a press conference, turn on a newscast, or overhear just about any phone conversation these days and you’ll hear mayors discussing R values, reporters announcing new fatalities and separated families comparing COVID case rolling averages in their counties. As coronavirus resurges across the country, medical data is no longer just the purview of epidemiologists (though a quick glance at any social media comments section shows an unlikely simultaneous surge in the number of virology experts and statisticians).

Journalists reporting on COVID, however, have a particular obligation to understand the data, to add context and to acknowledge uncertainty when reporting the numbers.

“Journalism requires more than merely reporting remarks, claims or comments. Journalism verifies, provides relevant context, tells the rest of the story and acknowledges the absence of important additional information.” - RTDNA Code of Ethics

This guide to common COVID metrics is designed to help journalists know how each data point is calculated, what it means and, importantly, what it doesn’t mean.


Vaccine Efficacy

How it’s calculated: Efficacy is typically reported as a percent, indicating how much less likely someone is to contract COVID symptoms after being vaccinated than someone who hasn’t been vaccinated. For example, 95% efficacy means a 95% lower risk of COVID symptoms after vaccination. It’s important to note that the time frame matters, as different vaccines take different periods to each peak efficacy.

Another key to note when reporting efficacy is that different vaccine producers defined post-vaccine cases differently, but of three major vaccine maker trials so far, none studied asymptomatic COVID cases. On the other hand, three currently approved vaccines all report 100% efficacy in preventing severe disease (requiring hospitalization or resulting in death) six to seven weeks after vaccination.

What it tells you: In a clinical trial setting, how much vaccination decreases the chances of experiencing COVID symptoms.

What it doesn’t tell you: 95% efficacy (for example) does not mean 5% of vaccinated people will still get COVID. It is also not yet known whether vaccines can prevent transmission from symptomatic or asymptomatic people.

More about efficacy data (Source: Live Science)

Number of Positive Cases

Tells you: The number of tests whose samples have come back positive.

How it’s calculated:

  • Check your source to see if case data represents only viral tests (which indicate active infection at the time the sample was taken) or also includes antibody tests (which indicate previous infection).
  • Check your source to see if case data is reported daily or cumulatively. Cumulative cases represent the total number of cases since a given start date, while daily totals represent only the number of new cases reported on a given day.
  • Check your source to see if case data is raw or averaged. If case numbers are reported raw, they may fluctuate depending on day of the week. For example, cases may jump in middle of week as the testing rate and rate of processing tests by health facilities increases. Rolling or trailing averages, on the other hand, give you the average over the last set period, such as the last 7 or 14 days. Rolling averages can help identify trends by minimizing the impact of outliers.
  • Check your source to see if multiple positive tests for the same individual are counted. Most jurisdictions count only one positive test per individual, which leaves some margin for error (such as if someone is tested in more than one reporting area) but the data is more closely representative of the number of people who have tested positive.

Doesn’t tell you:

  • The sensitivity or accuracy of the tests. Some tests are more accurate than others which can have higher levels of false negative results.
  • The infection period of those who tested positive. Since a positive test represents only confirmation of the virus in a given sample, it isn’t able to show when people who test positive became infected or, as test results often take days or weeks, how the infection status of those tested may have changed since testing.
  • Context about the population as a whole in the reporting area. For example, 100 positive tests in a county with a population of 1,000 has much different implications than 100 positive tests in a county with 1,000,000 residents.
  • The number or portion of untested people who are also infected.

Positivity Rate

Tells you: The percent of tests that come back positive.

A high positivity rate suggests that testing may be limited to the most likely cases (such as people who are symptomatic or have known exposure) and many potentially positive people may not be getting tested. A low positivity rate suggests that testing is reaching not only the most likely cases but also people more likely to test negative (like asymptomatic people not known to be exposed).

How it’s calculated: Positive tests divided by total tests.

Check your source to see if multiple positive or negative tests for the same person are included, eliminated or partially included. For example, including multiple negative tests for the same individual, but eliminating multiple positive tests for one individual, can artificially lower the positive case rate.

Doesn’t tell you:

  • Whether someone with a negative test had been infected prior to the negative test or has been infected since.
  • The percent of the population as a whole that is infected, since it does not account for the portion of untested people who are also infected or provide context about the population as a whole in the reporting area.
  • How ill people are.

Find more information about testing data here.

Infection Rate

Also known as R value or reproduction value

Tells you: An estimate of the extent of viral spread based on how many people, on average, an infected person infects. A number greater than one in general suggests the number of cases is increasing. A number less than one suggests the spread is slowing.

How it’s calculated: R value is indirectly and retrospectively estimated based on other data points, like the percentage of positive tests and number of deaths. Calculation methods vary and each includes different assumptions. Here’s one example and here’s another.

Doesn’t tell you:

  • The exact rate of virus transmission. Because R is based on assumptions and calculations from other data points (which are likely imperfect or insufficient measures as well), it is a useful estimate rather than a strict guidepost. It becomes less useful the less complete and accurate the data it is based on is.
  • How the infection rate is changing over time or its current value (since estimates based on past data will lag).
  • Exactly how many people each infected person spreads infection to (since this will vary based on factors including the individuals’ behavior and viral loads).
  • Whether there are bigger outbreaks more local than the area measured or whether a localized outbreak (like at a factory) or superspreading event is skewing the estimate.

Hospitalizations

Tells you: The number of hospital beds occupied by confirmed (or in some cases confirmed and suspected) COVID cases.

How it’s calculated:

Doesn’t tell you:

  • How close to capacity medical facilities are, unless accompanied by capacity data.
  • The total number of people who have been hospitalized with confirmed or suspected COVID, as hospitalization is typically reported as number of beds occupied on a given day.
  • The age, race or gender distribution of hospitalized patients, unless accompanied by demographic data, or how the demographic distribution compares to the reporting area overall. This data can provide key context. For example the COVID Tracking Project reports that “Nationwide, Black people are dying at 2.5 times the rate of white people.” Find more in the COVID Racial Data Tracker.
  • The number or rate of confirmed or suspected COVID patients who are receiving outpatient medical care or not seeking medical care.

Recovered Cases

How it’s calculated: Many jurisdictions do not report recovered cases specifically because it is difficult to calculate.

If your city or county does report recovered cases, check your source to see whether the data is a complete count or an estimate. A large number of cases makes it difficult to follow up regularly with each positive individual to get an exact count, so many jurisdictions use calculated estimates. For example, the Texas Department of Health Services estimates recovered cases by:

  • “Including total confirmed cases
  • Removing any fatalities
  • Estimating that approximately 20% of remaining cases required hospitalization and approximately 80% of remaining cases did not receive hospitalization (based on published study of trends in China)
  • Estimating that recovery time for hospitalized patients is approximately 32 days, and recovery time for non-hospitalized patients is approximately 14 days”

Check your source to see if your reporting area uses a test-based or symptom-based measure of recovered cases. Some jurisdictions may require one negative test, or two negative tests in a row in a given time frame, to be counted as recovered, though the CDC in general no longer recommends determining precautions like isolation through testing and the ongoing limited availability of testing means most jurisdictions that do report recoveries use a symptom-based estimate.

In some cases, this is defined as a given time period after symptoms begin to decrease, once 24 consecutive hours have passed with no fever and no fever reducing medication or once a patient is no longer isolated.

Tells you: Typically, an estimate of the number of people who tested positive but are no longer treated as active cases.

Doesn’t tell you:

  • The number of infected – symptomatic or asymptomatic – who were never tested or treated.
  • The number of infected people who are potentially infectious, since recovery estimates are often similar but not identical to the CDC’s current estimates of infectiousness.  
  • How ill those now recovered were.
  • Whether the recovered have any residual disability or injury.

Fatalities

How it’s calculated: Check your source to see if your reporting area reports confirmed or probable deaths. Confirmed deaths typically require a positive test. Probable deaths may be required to meet clinical conditions or have other evidence.

Check your source to see where cause of death is coming from – whether a death certificate signed by a physician in a hospital, a medical examiner or a coroner. Note that deaths are often reported as from an immediate cause (like respiratory distress) with additional information related to the reason (such as COVID-19) and sometimes additional contributing factors. Reported causes of death can be imprecise because it’s often difficult to determine causation. Some patients may die while infected without it being possible to determine conclusively whether and to what extent the virus contributed to the death. In other cases, deaths from strokes or heart attacks may be related to infection but could be difficult to identify as such.

Tells you: The number of deaths confirmed, or in some cases presumed to be, due to COVID.

Doesn’t tell you:

  • Depending on the reporting source and level of detail released, whether any underlying physical conditions may have contributed.
  • How many people are ill or recovering.
  • The demographic information of the deceased, unless demographic breakdowns are also provided.
  • The personal stories of those who have died or how their loved ones are affected.
  • Context about the overall population or infection rate of the reporting area.

Fatality Rate

How it’s calculated: Can be measured either as Infection Fatality Rate (proportion of infected people who die as a result of infection) or Case Fatality Rate (proportion of reported cases that result in death).

Check your source to see which measure is used and what method is used to obtain it. For example, some models include antibody test data to try to account for limited testing and get a better picture of total infections. However, the data’s usefulness will depend on factors like the sample size. Others try to calculate a more current rate by including estimates for the amount of time between an initial positive case confirmation and reported fatality.

Other experts report estimates of excess deaths, which according to the CDC “are typically defined as the difference between the observed numbers of deaths in specific time periods and expected numbers of deaths in the same time periods.” Calculation methods here also vary for both number of deaths and expected numbers, but the metric can illustrate the broad impact of the pandemic on increasing mortality.

Tells you: A general estimate of how deadly a disease is.

Doesn’t tell you:  

  • How many infected people are not included in the case number because they were not tested (which could artificially increase the reported fatality rate) or how many COVID-related fatalities were not included due to lack of testing or imperfect reporting (which could artificially decrease the reported fatality rate).
  • How many people are currently ill or recovering or what their health outcomes will be.
  • How the demographic characteristics and availability of treatment in the reporting area are affecting the outcomes for those infected, and how those may be changing over time.

Find more information about fatality rates here.


The data is only as good as the collection methods, the reporting methods and the interpretation or analysis. It’s a reporter’s job to add this context when reporting COVID data.

Check your public health department pages (here’s California’s, for example) for details about data reporting methods and standards. Be sure to check the methods used before reporting and note that in some cases, data may be updated retroactively. Whenever possible, consult an expert to provide an analysis of what the data does and doesn’t mean.  

Remember, trusted sources:

  • Acknowledge uncertainty
  • Demonstrate how the metrics are calculated
  • Point out how underlying assumptions and potentially imprecise data can affect estimates and calculations
  • Use multiple metrics together
  • Are cautious before identifying trends
  • Don’t conclude a trend based on a single data point

Find more helpful information about COVID data in these sources: