What environmental data are relevant to the study of infectious diseases like COVID-19?

The COVID-19 pandemic has triggered an increase in infectious disease modeling studies, some of which incorporate environmental parameters. These studies are driven by questions about the potential seasonality of disease transmission, potential comorbidities associated with other environmentally-linked respiratory diseases, and a desire to improve predictions to inform future national and local policies to control transmission. This page has been developed to facilitate access to environmental data commonly used in infectious disease modeling. Email questions or feedback about this page to one.health@noaa.gov.

Environmental datasets for infectious disease modeling

The table below was designed to help users quickly locate environmental datasets for a given variable and timescale. Within each cell, one or more datasets are referenced along with key information, a link to download the data, and a link to read the metadata for the source. Additional information on the data types is available below this table.

The table below was designed to help users quickly locate environmental datasets for a given variable and timescale. Within each cell, one or more datasets are referenced along with key information, a link to download the data, and a link to read the metadata for the source. Additional information on the data types is available below this table.

Parameter

Observational Climate Record and Reanalyses

Short-term Forecasts and Predictions (deterministic and probabilistic)

Long-term Projections (scenario-based)

Temperature

  • Avg
  • Min
  • Max
  • Anomaly

Global Station Daily

Global Historical Climatology Network Daily - GHCN Data Access
Station-based
Spatial: global, coverage varies (see Understanding Data Sources below)
Temporal: daily, coverage varies by station (19th century to present)
Practicalities: available as *.csv files (HTTPS access).

U.S. Gridded Monthly

Gridded 5km GHCN-Daily Dataset (U.S. only) nClimGrid (aggregated to monthly values)
Gridded
Spatial: U.S., 5km
Temporal: monthly, 19th century to present
Practicalities: Available as ASCII text

Global Gridded Monthly

NOAAGlobalTemp (V5)
Spatial: global, 5° × 5°
Temporal: monthly, 19th century to present
Practicalities: Available as *.nc

Global Comprehensive (Reanalysis) Hourly

Historical Climate Forecast System (CFS) Time Series)
Spatial: Global, 0.5° (approximately 56 km)
Temporal: hourly, 1979 - 2011
Practicalities: Available as *.grib2

Operational Climate Forecast System (CFS) Time Series
Spatial: Global, 0.5° (approximately 56 km)
Temporal: hourly, 2011 - present
Practicalities:Available as *.grib2
Filename (high-res)
tmp2m.gdas.YYYYMM.grib2
Filename (low-res)
tmp2m.l.gdas.YYYYMM.grib2

Global Gridded Reanalysis Hourly

NASA Modern-Era Retrospective analysis for Research and Applications (MERRA-2)
Spatial: global, 0.5° x 0.625°
Temporal: hourly, 1980 - present

Global Station Sub-Daily

Integrated Surface Database
Temporal: sub-hourly, 1901 - present
Practicalities: available as *.csv or via Common Access or GIS services

European Global Comprehensive (Reanalysis) Hourly

ECMWF ERA5 climate reanalysis
Spatial: global, 0.25°x0.25°
Temporal: hourly, 1971 - present
Practicalities: available as *.grib files

NOAA NDFD Forecast

Metadata

Lead 1-3 days
Daily Max Temp
Daily Min Temp
Hourly Temp

Lead 4-7 days
Daily Max Temp
Daily Min Temp
Hourly Temp
Spatial: U.S., 5km
Temporal: hourly, 1 - 7 day lead time
Practicalities: available as *.bin
(rename to *.grib2)

Multi-Model Ensemble

NOAA National Blend of Models
Lead 0-10 days
Visualize Time Series and download CSV files by station.
(Extendable for CSV output, longer than 10 days to subseasonal, other parameters)
Spatial: U.S., coverage varies by station
Temporal: hourly, 10 day lead time
Practicalities: available as *.csv

Monthly & Seasonal

North American Multi-Model Ensemble Spatial: global, varies
Temporal: 2019-2020, monthly and seasonal
Practicalities: See BAMS article for more information

CMIP6 Climate Projections

CMIP6-GFDL Data Search | CMIP6-GFDL | ESGF-CoG Spatial: global, varied resolution
Temporal: 1850 - 2100, varied resolution
Practicalities: guidance for users

Humidity

  • Relative
  • Absolute
  • Specific
  • Dew point

European Global Comprehensive (Reanalysis) Hourly

ECMWF ERA5 climate reanalysis
Spatial: global, 0.25°x0.25°
Temporal: hourly, 1971 - present
Practicalities: available as *.grib files

Global Station Sub-Daily

Integrated Surface Database
Temporal: sub-hourly, 1901 - present
Practicalities: available as *.csv or via Common Access or GIS services

NOAA NDFD Forecast

Metadata

Lead 1-3 days
Daily Max RH
Daily Min RH
Hourly RH
6hr Dew Point

Lead 4-7 days
Daily Max RH
Daily Min RH
Hourly RH
6hr Dew Point
Spatial: U.S., 5km
Temporal: hourly, 1 - 7 day lead time
Practicalities: available as *.bin
(rename to *.grib2)

CMIP6 Climate Projections

CMIP6-GFDL Data Search | CMIP6-GFDL | ESGF-CoG Spatial: global, varied resolution
Temporal: 1850 - 2100, varied resolution
Practicalities: guidance for users

Ultraviolet Index Wm-2

European Global Comprehensive (Reanalysis) Hourly

ECMWF ERA5 climate reanalysis
Spatial: global, 0.25°x0.25°
Temporal: hourly, 1971 - present
Practicalities: available as *.grib files

County-Level

CDC National Environmental Public Health Tracking Network
Spatial: U.S., state or county level
Temporal: monthly or annual avg, 2005 to 2015
Practicalities: Available as *.csv

Global Station Sub-Daily

Integrated Surface Database
Temporal: sub-hourly, 1901 - present
Practicalities: available as *.csv or via Common Access or GIS services

NOAA Forecast

NWS Global UV Index Forecast
Spatial: global, 0.5x0.5 degree
Temporal: hourly, --5 day lead time
Practicalities: available as *.grib2

CMIP6 Climate Projections

CMIP6-GFDL Data Search | CMIP6-GFDL | ESGF-CoG
Spatial: global, varied resolution
Temporal: 1850 - 2100, varied resolution
Practicalities: guidance for users

Understanding Data Sources

Station Data

Real-time and historical data are available for surface-based weather stations on every continent. The most commonly measured climate variables are temperature and precipitation, the latter being available for more than 100,000 locations worldwide. Some stations (such as those at airports) also observe additional variables—including pressure, wind, and cloud cover—on an hourly basis. Coverage of the stations is sparse in some regions; however, most airports have a station.

Interpolated Gridded Data

Many modeling applications require data in locations that do not have weather stations. Gridded datasets bridge this gap. These datasets consist of estimated data at evenly spaced intervals, such as every 5 kilometers. Researchers construct these datasets by using station data in combination with statistical methods (e.g., by interpolating station data to a uniform grid or fitting a statistical surface through the original observations). Gridded datasets are valuable in many modeling applications that require evenly spaced data as input. They are also useful in computing averages for societally relevant areas, such as counties and census tracts. Interpolated grids estimate unknown values at locations by using nearby points where values are known. Read more about interpolation

Comprehensive Climate Monitoring Data

Comprehensive Climate Monitoring data, also known as Reanalysis datasets, are produced by running climate models over a historical period while constraining many of the values of the model to match observed values. These data are multivariate, spatially and temporally complete, and gridded. This process provides a consistent high-resolution output that smooths over data gaps by providing the best possible estimate of the true values by using both observations and models to estimate what value most likely would have been observed. For example, in a reanalysis for the period from 1950 to 2000, if a station was only active from 1970 onward, the 20 years of missing values can be estimated by constraining the model using existing observations that were active so that the temperature (and other values) that might have been observed at that station can be filled in. Read more about reanalyses »

Climate Model Output

Climate models break the globe into a 3-dimensional grid and simulate a large number of variables in each grid cell at each time step using fundamental physics equations. Climate models are “spun up” by initializing them with observations and letting them run over many years of timesteps, solving the equations for each grid cell during each step, and allowing for interactions between adjacent grid cells. Climate model output should be used to estimate longer-term, probabilistic climate statistics rather than to predict deterministic values. Read more about climate models »

Multi-Model Ensemble

A Multi-Model Ensemble (MME) is a product containing outputs from one or more models from different modeling centers that have been averaged together to create what is called an ensemble. In some cases, more than one model run (realization) per model is included, and each realization is run with slightly perturbed parameters to better represent uncertainty in the MME mean. MMEs are often more skillful than individual models, and better represent prediction error, because the ensemble as a whole balances out individual model biases to some extent.  Read more about Multi-Model Ensembles »

This section describes the most commonly encountered data sets for representing environmental information as well as packages available for accessing and analyzing environmental information in specific programming languages.

Understanding File Formats

Language Specific Sources

  • CSV

    • A comma separated values file, which can be opened in most text or spreadsheet editing applications.
  • NC
    • A NetCDF file, which is used to display geospatial information stored in an array of points
  • GRIB2
    • The GRIdded Binary, or GRIB, file format was defined by the World Meteorological Organization to store two-dimensional data.
  • BIN
    • BIN files are binary files, but for the purpose of analyzing the datasets represented here, you can rename *.bin files to *.grib2 files and most readers will be able to display them.
R

Python

Glossary of Terms & Acronyms

  • CDC = Centers for Disease Control
  • CFS = Climate Forecast System
  • CMIP6 = Coupled Model Intercomparison Project, version 6
  • CoG = University of Colorado collaboration environment in support of the ESGF
  • ECMWF = European Centre for Medium-range Weather Forecasts
  • ERA5 = ECMWF Reanalysis of the Atmosphere, 5th generation
  • ESGF = Earth System Grid Federation
  • GFDL = NOAA’s Geophysical Fluid Dynamics Laboratory
  • GHCN = Global Historical Climatology Network
  • GIS = Geographical Information System
  • HTTPS = HyperText Transfer Protocol Secure
  • MERRA = Modern-Era Retrospective analysis for Research and Applications
  • NDFD = NOAA’s National Digital Forecast Database
  • NMME = North American Multi-Model Ensemble
  • NWS = National Weather Service
  • RH = Relative Humidity
  • UV = Ultraviolet
  • Wm2 = Watts per square meter

Acknowledgments

This page is developed through NOAA’s One Health Team to serve the COVID-19 and broader health research and decision making community. Acknowledgment and thanks go to: Hunter Jones, Mary Lindsey, Richard Glupker, Stan Benjamin, Georg Grell, and Juli Trtanj, NOAA Research (OAR), and Jennifer Runkle, Russell Vose, and Jeff Privette, National Environmental Satellite and Data Information Service (NESDIS).

Example Model Studies

This list of studies is compiled by the NOAA Central Library. This list is a sampling of COVID-19 studies employing climate variables and will be updated periodically. The presence of a study in this list does not imply endorsement. Many of the listed studies have not been peer reviewed.