In this article and video below, I will show you step by step how you can set up the API correctly to download ERA-5 Climate datasets using Python codes. ERA-5 datasets include grid-level monthly averages and hourly rainfall, 2m temperature, sea surface temperature, land cover, soil characteristics, evaporation, runoff(flooding), wind speed, and much more.
I will show you examples of how to fetch data for the whole world, for a specific country or region, and how to process them using R Studio.
Note: You can manually request the files from their website (which can be time-consuming), or you can use an API to make the process faster in Python.
I struggled a lot to correctly set up the whole process, and download and extract datasets; so when I managed to do so, I decided to make a video tutorial to help others as well. This short article is an introduction to how best to access ERA-5 data. There are other articles out there on the topic as well (links at the end).
Watch this full video below to understand how you can download climate datasets from ERA-5 using Python.
Note that you will need to properly set up the CDS API on your computer for this to work and this video below shows you how I did it.
This data set is FREELY available to everyone! And much better, there is an API that grants easy use of this dataset.
How to Request ERA5 Data for a Specific Subregion (e.g., country, region, district, city, etc.).
IMPORTANT UPDATE: ERA-5 now allows to extract data at the sub-region level. All you need now is to simply enter the coordinates of an area/region (boundary box or bbox) manually as shown below and it will process and provide data only for the specified area.
Note: Instead of the whole world, I specified coordinates of a boundary box or bbox (a rectangle) of an area for which I wanted the data. This is much easier and faster than having to download the whole world data and crop later on in R.
For example, the bbox info above is for the U.S. (a rectangle that covers the U.S.). I got that information from https://bboxfinder.com , where I used the rectangle function on the left, drew a rectangle around my country/area of interest, then copied the resulting bbox information and pasted them in ERA5 subregion box (as seen in the picture above).
TIP: You need to provide the coordinates as below:
Here is a screenshot of how I got the correct boundary box info for the United States in bboxfinder (this can be applied to any country or area of your choice).
Note that those are:
Northern Latitude: 49.884 (the upper line of the rectangle)
Southern Latitude: 24.49 (lower line of the rectangle)
Eastern Longitude: -66.05 (right line of the rectangle)
Western Longitude: -128.21 (left line of the rectangle).
If you provide the Northern longitude instead of the latitude, for example, your extracted data will be for another part of the world.
With this technique, you now request data for only a specific country, or city, or region over time instead of getting for the whole world every time.
More on ERA5 Datasets
ERA5 offers hourly and monthly gridded (spatial ) information for a number of land and oceanic climate variables. Different data are measured using different spatial resolutions. Some have very high and others, very low spatial resolution.
ERA5 data are updated regularly with new products that you can download. Updates of the dataset will be offered to users within a 7 days time frame.
Besides the fact that there are new products regularly, I really like the fact that you download data for multiple time periods in one single file instead of multiple files, which will make you use a loop to extract all the files.
With a single file containing multiple layers (for time periods), extraction is much easier!
There are spatial datasets that range from 1950 to the present, or 1981 to the present depending on the variable of interest.
Most of the variables in the ERA-5 database can be downloaded in two different formats: .netcdf and .grib.
How to process ERA5 Datasets in R
I use R Studio to process the .netcdf data pretty easily using the package “raster“.
Here is a Nice blog post showing you how to easily extract a “.nc” or “netcdf” file using the raster package.
Note: There is a difference between using the function “raster” or “brick” to read the data.
Example: Suppose you have downloaded the temperature data for Cameroon from January through May 2019 and the file extracted is labeled “temperature.nc” on your PC. This file is multi-layered where each layer represents the temperature for a month across space.
To read in R,
library(raster) #install this package library(sf) filepath <- “C:/Documents/data/era5-data/temperature.nc” data.brick <- brick(filepath ) #this will read your data in a way that you could still extract data for each month separately (5 layers for the 5 months) data.raster <- raster(filepath) #this will collapse your data across all months and each lon-lat or grid cell will only have ONE value for temperature (Not sure you want that). #loading Cameroon map cmr_shape<- getData(‘GADM’, country=’cmr’, level=3) #importing the shapefile/map of Cameroon at the lowest admin level available. You can specify for any country using ISO code. Some countries have the lowest level=2 and others level=3. Play around with it! plot(cmr_shape) #to visualize the shapefile and all the admin 3 available. cmr_sf <- as(cmr_shape, “sf”) #convert the spatial object into an sf object # Extracting data for municipalities in Cameroon extract_cmrdata <- extract(data.brick, cmr_shape, fun=mean, df=TRUE) #This will extract/calculate the temperature of each municipality of Cameroon for each month by averaging all grid cells that overlap the municipality. df=TRUE is just to return the result in a data frame. #combine extracted data with data from map: you will have the region name, country name, district name, municipality name, and the nightlight intensity info. final_data <- cbind(st_drop_geometry(cmr_sf), extract_cmrdata) You can play around with the options “raster” offers by reading the documentation, but be very careful when using raster or brick to read the file. # Note: You can add the option method=”bilinear” to the options above in extract().. What it does is it will interpolate the values of the four closest grid cells to give you the value of a specific grid. # This is most useful if you have some missing data in your raster (sometimes developing countries data are not complete and given the spatial correlation in the satellite data (Donaldson and Storeygard, 2016), I think you can fill up missing values of some grid cells by using nearby non-missing grid cells.) # The default method is (method= “simple”) and this simply extracts the value of the grid cell in which the coordinates fall or the area overlaps without any interpolation. # Also, you can add the option: weight=TRUE . It calculates an area-weighted average instead of a simple average. The weight is assigned to grid cells (if they are fully inside the polygon or cutting it) and used for calculation.
About using raster vs. brick to read: Suppose you only downloaded annual rainfall data for 2019, then each grid will only have one layer instead of multiple layers.
You can use brick or raster to read the file, it won’t really matter; but if your file has multiple layers (for example, you selected multiple years or months in ERA5 during extraction), which usually represent time (i.e., day, month, year), then I strongly recommend using “brick” command.
Once you’ve learned how to extract data in a more usable form, you can use these datasets to study anything you want.
ERA-5 replaces the ERA-Interim reanalysis dataset.
If you want to learn how I extract night light intensity data for any region or city of the world with more extraction tips using the raster package, read this piece:
Nightlights Satellite Data (Free Download) and Tips to Extract Nightlight Intensity
Other Resources I used to learn how to download ERA5 myself
LINK TO MANUALLY DOWNLOAD ERA-5 REANALYSIS DATA
If you like the article, consider sharing it using the social media buttons to help others.