Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data

*Note: A previous version of my site, which had about 5000 monthly views was hacked, and I rebuilt this from scratch. Some content you read before may not be available. I may earn commissions on some links at no cost to you. Opinions are my own.
Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 1

Want to download hourly, monthly, and annual pollution data? You’ve come to the right place!

Below I show you where to download pollution data from multiple sources and how to easily extract the value of pollutants (e.g., PM 2.5, PM 10, nitrogen, carbon monoxide, Ozone, etc.) for any area, region, or point coordinates you may want.

Before going any further, check out my working paper on Air Pollution, Avoidance Behavior, and Labor Market outcomes: Evidence from the United States, which shows how I use different pollution data below for the purpose.

Source 1: The CAMS Atmosphere Data Store (ADS)

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 3
  • FREE
  • Date range: 2003 – Present
  • Coverage: Global
  • 3-hourly (every 3 hours) & Monthly
  • Resolution: 0.75 deg * 0.75 deg (approximately 80 km). Perhaps too big for you.
  • Format: .grib or netCDF files

Other info: This dataset contains almost ALL possible atmospheric variables (PM2.5, PM1, PM10, ozone (O3), carbon monoxide (CO), nitrogen dioxide (NO2), Dust, Aerosols, Black Carbon, Methane, and so much more).

Click Here to Download (2003-Present).

Source 2: Global Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD) with GWR

  • FREE
  • Date range: 1998 – 2019
  • Coverage: Global
  • Annual Data
  • Resolution: 0.01 degrees (approx. 1km sq for each grid)
  • Format: .geotiff files

Click Here to Download (1998-2019).

More info

About the Method

” Geographically Weighted Regression (GWR) is used with global ground-based measurements from the World Health Organization (WHO) database to predict and adjust for the residual PM2.5 bias per grid cell in the initial satellite-derived values. These estimates are primarily intended to aid in large-scale studies. Gridded data sets are provided at a resolution of 0.01 degrees to allow users to agglomerate data as best meets their particular needs. Data sets are gridded at the finest resolution of the information sources that were incorporated, but do not fully resolve PM2.5 gradients at the gridded resolution due to influence by information sources at coarser resolution. The data are distributed as GeoTIFF files and are in WGS84 projection.”

Source 3: Sentinel-5P OFFL AER AI: Offline UV Aerosol Index

  • FREE
  • Date range: 2018-07-04 to Present
  • Coverage: Global
  • Daily Data
  • Resolution: 1113.2 meters
  • Format: maybe .netcdf (I don’t know how to process Google Earth Engine data yet). You can watch this YouTube video below to learn more.

Other info: This is probably one of the lowest resolution data on pollution. The issue is the time period

This dataset provides offline high-resolution imagery of the UV Aerosol Index (UVAI), also called the Absorbing Aerosol Index (AAI).

When the AAI is positive, it indicates the presence of UV-absorbing aerosols like dust and smoke. It is useful for tracking the evolution of episodic aerosol plumes from dust outbreaks, volcanic ash, and biomass burning.

Click Here to Access the Sentinel Data

Source 4: WUSTL Pollution Data

Ground-level fine particulate matter (PM2.5). You can also derive Dust and Sea-salt from it too.

  • FREE
  • Date range: 1998-2020
  • Coverage: Global
  • annual and monthly
  • Resolution: 0.01° × 0.01° degrees (approx. 1km sq for each grid); 0.05° × 0.05°
  • Format: .netcdf (.nc)

Click Here to Download (1998-2019).

You can also get surface SO2 HERE.

Source 5: CACES Pollution Data

This dataset tracks six pollutants:

Four gases: O3, CO, SO2, NO2; two aerosols: PM10, PM2.5) throughout the contiguous U.S.

  • FREE
  • Coverage: United States only
  • Date range: 1979 – 2015 (SO2, NO2), 1988 – 2015 (PM10), 1990-2015 (CO), 1999-2015 (PM2.5) and 1979-2015 (O3)
  • annual and monthly
  • Resolution: Data are available at national, state, county, census tract, and census block group levels
  • Format: .excel

Click Here to Download (1998-2019).

Make sure you put your email and org name before proceeding.

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 5

Then select your desired pollutant.

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 7

Source 6: EPA Pollution Data (EPA = Environmental Protection Agency)

Time range: 1999 – Present

Pollutants covered: PM2.5, Pm 10, Ozone, Air Quality Index (AQI), Ozone etc.

Type: Daily, Monthly

Coverage: United States only

Data format: Excel files

There are two ways to download the EPA:

Source 1: Pregenerated files to download directly, daily average by county

Download from Source #1

Below is what the page looks like:

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 9

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 11

The data in each excel will cover all monitors across the whole U.S.

Source 2: EPA Query tool

Download from Source #2

Below is a screenshot of how you can download it:

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 13

TIP: I would suggest downloading the excel files if your study covers the U.S. and many years, or using the query tool if you just want data for a state or specific monitoring sites. I used the excel files for my study on Air Pollution, Avoidance Behavior, and Labor Market outcomes: Evidence from the United States. Check it out. 😉

R Codes to Extract Pollutants from Different Datasets

The codes below only apply to the spatial datasets and not the EPA data.

Example Code to Extract the CAMS ADS Data for China counties

You just need to change your shapefile for your country (at any admin level) to process yours as well.

This is what the final dataset will look like:

The final values are the county-level average of pollutants by month & year.

Pollution Satellite Data (Free Download) and Codes to Extract Local Pollution Data 15

#GOAL: Get and process monthly Pollutants data for CHINA Counties from 2010-2020 from CAMS Satellite data
# Data source: https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-global-reanalysis-eac4?tab=form
 
rm(list=ls())
library(sp)
library(raster)
library(spatstat)
library(rgdal)    
library(maptools)
library(tmap)
library(tmaptools)
library(ncdf4)
library(leaflet)
library(dplyr)
library(plyr)
library(sf)
library(tidyr)
library(data.table)
library(haven)
library(ncdf4)
library(foreign)
library(ggpubr)
library(pacman) # to use pivot to reshape
gc()
 
 
setwd (“C:/Users/dbere/Dropbox/COLLABS/subsidy, crop residue burn, air quality/inputdata/pollution”)
 
########################################################
# Processing PM 2.5 Satellite CAMS data on same days
#######################################################
 
 
china <- raster::getData(name=”GADM”, country=”CHN”, level=3)
china <- as(china, “sf”)
 
chinamap <- st_read(“C:/Users/dbere/Dropbox/COLLABS/subsidy, crop residue burn, air quality/sample counties china/sample_counties1.shp”)
 
# listing files in a vector
 
all.pollution.files<- list.files( “.”, pattern = “.*.nc$”, recursive = TRUE, full.names = FALSE) # we are not loading full path because we will use each file name in a loop
 
########################################################
# Processing PM 10 Satellite CAMS data Monthly county-level
#######################################################
 
pollutant <- brick(all.pollution.files[1])
pollutant
 
## Extracting for US Counties monthly
chn.monthly.pol.extract <- cbind(chinamap, raster::extract(pollutant, chinamap, fun=mean, weight=TRUE, method=”simple”, df=TRUE))
chn.monthly.pol.extract$ID <- NULL  # Removing the ID column
 
 
###############
# Process for all pollutants at ONCE
 
for (i in 2:length(all.pollution.files)) {
  newfile <- brick(all.pollution.files[i]) # this will be the second file in first iteration and so on
 
  #Extracting pollutant file for china map
  extractpol <- cbind(chinamap, raster::extract(newfile, chinamap,
                                       fun=mean, weight=TRUE, method=”simple”, df=TRUE))
  extractpol$ID <- NULL # removing column ID
 
  j<- gsub(“_2009_2020.nc”, “”, all.pollution.files[i], fixed = TRUE) # the name of the file in “i” is assigned to object j without 2009_2020.nc
  assign(j, extractpol) #this assign the name extracted from i to the new object or data table extractpol
 
  gc()
}
 
#################
# Fixing variable names
 
# Removing unnecessary characters in variable names ( currently vars have the form X2010.01) = transform it to pm10_2010_01_01 so that it can be used in other programs (Stata)
 
#—————–
# Exporting Data For PM 10
names(chn.monthly.pol.extract) <-gsub(“\\X”,”pm10_”,names(chn.monthly.pol.extract)) 
names(chn.monthly.pol.extract) <-gsub(“\\.”,””,names(chn.monthly.pol.extract)) 
 
chn.monthly.pol.extract.long <-   pivot_longer(st_drop_geometry(chn.monthly.pol.extract),
                                                 cols = starts_with(“pm10_”), # Select columns to reshape
                                                 names_sep = “_”, # Character that divides variable name and suffix
                                                 names_to =  c(“.value”, “yearmonth”),
                                                 values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                                 values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
 
write_dta(chn.monthly.pol.extract.long, “./sat_pm10_counties_monthly_china_long_2009_2020.dta”, version=14)
 
#—————–
# Exporting Data For PM2.5
names(CAMS_monthly_avg_pm2_5_CHINA) <-gsub(“\\X”,”pm25_”,names(CAMS_monthly_avg_pm2_5_CHINA)) 
names(CAMS_monthly_avg_pm2_5_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_pm2_5_CHINA)) 
 
CAMS_monthly_avg_pm2_5_CHINA.long <-   pivot_longer(st_drop_geometry(CAMS_monthly_avg_pm2_5_CHINA),
                                               cols = starts_with(“pm25_”), # Select columns to reshape
                                               names_sep = “_”, # Character that divides variable name and suffix
                                               names_to =  c(“.value”, “yearmonth”),
                                               values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                               values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
 
write_dta(CAMS_monthly_avg_pm2_5_CHINA.long, “./sat_pm25_counties_monthly_china_long_2009_2020.dta”, version=14)
 
#—————–
# Exporting Data For Aerosols
names(CAMS_monthly_avg_total_aerosols_550nm_CHINA) <-gsub(“\\X”,”aerosols_”,names(CAMS_monthly_avg_total_aerosols_550nm_CHINA)) 
names(CAMS_monthly_avg_total_aerosols_550nm_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_aerosols_550nm_CHINA)) 
 
CAMS_monthly_avg_total_aerosols_550nm_CHINA.long <-   pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_aerosols_550nm_CHINA),
                                                    cols = starts_with(“aerosols_”), # Select columns to reshape
                                                    names_sep = “_”, # Character that divides variable name and suffix
                                                    names_to =  c(“.value”, “yearmonth”),
                                                    values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                                    values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
 
write_dta(CAMS_monthly_avg_total_aerosols_550nm_CHINA.long, “./sat_aerosols_counties_monthly_china_long_2009_2020.dta”, version=14)
 
#—————–
# Exporting Data For Carbone Monoxide
names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA) <-gsub(“\\X”,”carbmonoxide_”,names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA)) 
names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA)) 
 
CAMS_monthly_avg_total_column_carbon_monoxide_CHINA.long <-   pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA),
                                                                   cols = starts_with(“carbmonoxide_”), # Select columns to reshape
                                                                   names_sep = “_”, # Character that divides variable name and suffix
                                                                   names_to =  c(“.value”, “yearmonth”),
                                                                   values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                                                   values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
 
write_dta(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA.long, “./sat_carbmonoxide_counties_monthly_china_long_2009_2020.dta”, version=14)
 
#—————–
# Exporting Data For Nitrogen
names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA) <-gsub(“\\X”,”nitrogendioxide_”,names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA)) 
names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA)) 
 
CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA.long <-   pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA),
                                                                           cols = starts_with(“nitrogendioxide_”), # Select columns to reshape
                                                                           names_sep = “_”, # Character that divides variable name and suffix
                                                                           names_to =  c(“.value”, “yearmonth”),
                                                                           values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                                                           values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL

write_dta(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA.long, “./sat_nitrogendioxide_counties_monthly_china_long_2009_2020.dta”, version=14)
 
#—————–
# Exporting Data For Nitrogen
names(CAMS_monthly_avg_total_column_ozone_CHINA) <-gsub(“\\X”,”ozone_”,names(CAMS_monthly_avg_total_column_ozone_CHINA)) 
names(CAMS_monthly_avg_total_column_ozone_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_ozone_CHINA)) 
 
CAMS_monthly_avg_total_column_ozone_CHINA.long <-   pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_ozone_CHINA),
                                                                            cols = starts_with(“ozone_”), # Select columns to reshape
                                                                            names_sep = “_”, # Character that divides variable name and suffix
                                                                            names_to =  c(“.value”, “yearmonth”),
                                                                            values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                                                            values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
 
write_dta(CAMS_monthly_avg_total_column_ozone_CHINA.long, “./sat_ozone_counties_monthly_china_long_2009_2020.dta”, version=14)
 
#—————–
# Exporting Data For Sulf Dioxide
names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA) <-gsub(“\\X”,”sulfdioxiode_”,names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA)) 
names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA)) 
 
CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA.long <-   pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA),
                                                                 cols = starts_with(“sulfdioxiode_”), # Select columns to reshape
                                                                 names_sep = “_”, # Character that divides variable name and suffix
                                                                 names_to =  c(“.value”, “yearmonth”),
                                                                 values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                                                 values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL

write_dta(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA.long, “./sat_sulfdioxiode_counties_monthly_china_long_2009_2020.dta”, version=14)
 
###################################
## APPEND MULTIPLE FILES AT ONCE
##################################
 
  allfiles <- cbind(chn.monthly.pol.extract.long, CAMS_monthly_avg_pm2_5_CHINA.long[,12], CAMS_monthly_avg_total_aerosols_550nm_CHINA.long[,12],
                    CAMS_monthly_avg_total_column_carbon_monoxide_CHINA.long[,12], CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA.long[,12],
                    CAMS_monthly_avg_total_column_ozone_CHINA.long[,12], CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA.long[,12])
   
  # Multiplying values by the corresponding multipler: PM pollutants have to be multiplied by 1 billion to have regular known values
  # All units are described here at bottom of: https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-global-reanalysis-eac4?tab=overview
allfiles <- allfiles%>%
  mutate(pm10 = pm10* 1000000000, pm25 = pm25* 1000000000, # from kg/m3 to mg/m3 is 1 billion factor
         )
 
write_dta(allfiles, “./all_pollutants_counties_monthly_china_long_2009_2020.dta”, version=14

Example Code to Extract the WUSTL PM2.5 Data for Africa Admin 3 shapefiles

Make sure you change all the local paths and adapt to your context.

#GOAL: Get Daily Pollutant data from 1998-2019 from CAMS Satellite data
# Data source: https://sites.wustl.edu/acag/datasets/surface-pm2-5/#V5.GL.02
 
  rm(list=ls())
  library(sp)
  library(raster)
  library(spatstat)
  library(rgdal)    
  library(maptools)
  library(tmap)
  library(tmaptools)
  library(ncdf4)
  library(leaflet)
  library(dplyr)
  library(plyr)
  library(sf)
  library(tidyr)
  library(data.table)
  library(haven)
  library(ncdf4)
  library(foreign)
  library(ggpubr)
  library(mapview)
  library(pacman) # to use pivot to reshape
  library(stars)
  library(rgdal)
  gc()
 
  final <-“C:/Documents/Data/otherdata/WUSTL_pollution_data_annual/WUSTL_DHS_Africa_pm25”
 
########################################################
# Processing PM 2.5 Satellite WUSTL yearly data
#######################################################
input <-“./Data/otherdata/WUSTL_pollution_data_annual/PM25_full_annual/0.05 degrees”
 
  # # DHS Africa Map
  africashapefile.sf<- st_read(“C:/Users/dbere/Documents/DATA/DHS DATA/Africa shapefiles/africa_shape_dhs_countries.shp”)
  africashapefile.sf # this allows to get extent of boundary box containing almost all African countries. I constructed this map myself to use in different situations. Use your own map. Check the GADM website to download country maps/shapefiles.
 
  #Selecting some countries
  africashapefile1.sf <- africashapefile.sf%>%
    dplyr::filter(grepl(‘Angola|Benin|Burkina Faso|Burundi|Cameroon|Chad|Congo Democratic Republic|Ivoire|Ethiopia|Ghana|Guinea|Kenya’, NAME_0)) # forma large map I constructed, I am just selecting some countries I want to extract for
 
  africashapefile1 <- st_drop_geometry(africashapefile1.sf) # dropping column geometry in this clone dataframe. Will be used later.
  levels(as.factor(africashapefile1.sf$NAME_0)) # checking the country names
 
  africashapefile1 <- africashapefile1 %>%  #selecting important columns in dataframe
    dplyr::select(GID_0, NAME_0, GID_1, NAME_1, GID_2, NAME_2, GID_3, NAME_3, GID_4, NAME_4)
 
  africashapefile1.sf <- africashapefile1.sf %>%  # selecting only geometry column
    dplyr::select(NL_NAME_1)
  
########################################################
# Processing PM 2.5 WUSTL
#######################################################
  st_bbox(africashapefile1.sf)
  # Open and crop to Africa extent
 
  ## 1998
  pollutant.98 <- brick(file.path(input, “ACAG_PM25_V4GL03_199801_199812_0p05.nc”))
  pollutant.98
  pollutant.98 <- raster::crop(pollutant.98, c(-18.979003,51.084828, -35.044650,26.626175))
  #mapview(pollutant.98[[1]])
  names(pollutant.98) <- “pm25_1998” #rename layer
 
  ## 1999
  pollutant.99 <- brick(file.path(input, “ACAG_PM25_V4GL03_199901_199912_0p05.nc”))
  pollutant.99 <- raster::crop(pollutant.99, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.99) <- “pm25_1999” #rename layer
 
  ## 2000
  pollutant.2000 <- brick(file.path(input, “ACAG_PM25_V4GL03_200001_200012_0p05.nc”))
  pollutant.2000 <- raster::crop(pollutant.2000, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2000) <- “pm25_2000” #rename layer
 
  ## 2001
  pollutant.2001 <- brick(file.path(input, “ACAG_PM25_V4GL03_200101_200112_0p05.nc”))
  pollutant.2001 <- raster::crop(pollutant.2001, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2001) <- “pm25_2001” #rename layer
 
  ## 2002
  pollutant.2002 <- brick(file.path(input, “ACAG_PM25_V4GL03_200201_200212_0p05.nc”))
  pollutant.2002 <- raster::crop(pollutant.2002, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2002) <- “pm25_2002” #rename layer
 
  ## 2003
  pollutant.2003 <- brick(file.path(input, “ACAG_PM25_V4GL03_200301_200312_0p05.nc”))
  pollutant.2003 <- raster::crop(pollutant.2003, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2003) <- “pm25_2003” #rename layer
 
  ## 2004
  pollutant.2004 <- brick(file.path(input, “ACAG_PM25_V4GL03_200401_200412_0p05.nc”))
  pollutant.2004 <- raster::crop(pollutant.2004, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2004) <- “pm25_2004” #rename layer
 
  ## 2005
  pollutant.2005 <- brick(file.path(input, “ACAG_PM25_V4GL03_200501_200512_0p05.nc”))
  pollutant.2005 <- raster::crop(pollutant.2005, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2005) <- “pm25_2005” #rename layer
 
  ## 2006
  pollutant.2006 <- brick(file.path(input, “ACAG_PM25_V4GL03_200601_200612_0p05.nc”))
  pollutant.2006 <- raster::crop(pollutant.2006, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2006) <- “pm25_2006” #rename layer
 
  ## 2007
  pollutant.2007 <- brick(file.path(input, “ACAG_PM25_V4GL03_200701_200712_0p05.nc”))
  pollutant.2007 <- raster::crop(pollutant.2007, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2007) <- “pm25_2007” #rename layer
 
  ## 2008
  pollutant.2008 <- brick(file.path(input, “ACAG_PM25_V4GL03_200801_200812_0p05.nc”))
  pollutant.2008 <- raster::crop(pollutant.2008, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2008) <- “pm25_2008” #rename layer
 
  ## 2009
  pollutant.2009 <- brick(file.path(input, “ACAG_PM25_V4GL03_200901_200912_0p05.nc”))
  pollutant.2009 <- raster::crop(pollutant.2009, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2009) <- “pm25_2009” #rename layer
 
  ## 2010
  pollutant.2010 <- brick(file.path(input, “ACAG_PM25_V4GL03_201001_201012_0p05.nc”))
  pollutant.2010 <- raster::crop(pollutant.2010, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2010) <- “pm25_2010” #rename layer
 
  ## 2011
  pollutant.2011 <- brick(file.path(input, “ACAG_PM25_V4GL03_201101_201112_0p05.nc”))
  pollutant.2011 <- raster::crop(pollutant.2011, c(-18.979003,51.084828, -35.044650,26.626175))
  mapview(pollutant.2011[[1]])
  names(pollutant.2011) <- “pm25_2011” #rename layer
 
  ## 2012
  pollutant.2012 <- brick(file.path(input, “ACAG_PM25_V4GL03_201201_201212_0p05.nc”))
  pollutant.2012 <- raster::crop(pollutant.2012, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2012) <- “pm25_2012” #rename layer
 
  ## 2013
  pollutant.2013 <- brick(file.path(input, “ACAG_PM25_V4GL03_201301_201312_0p05.nc”))
  pollutant.2013 <- raster::crop(pollutant.2013, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2013) <- “pm25_2013” #rename layer
 
  ## 2014
  pollutant.2014 <- brick(file.path(input, “ACAG_PM25_V4GL03_201401_201412_0p05.nc”))
  pollutant.2014 <- raster::crop(pollutant.2014, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2014) <- “pm25_2014” #rename layer
 
  ## 2015
  pollutant.2015 <- brick(file.path(input, “ACAG_PM25_V4GL03_201501_201512_0p05.nc”))
  pollutant.2015 <- raster::crop(pollutant.2015, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2015) <- “pm25_2015” #rename layer
 
  ## 2016
  pollutant.2016 <- brick(file.path(input, “ACAG_PM25_V4GL03_201601_201612_0p05.nc”))
  pollutant.2016 <- raster::crop(pollutant.2016, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2016) <- “pm25_2016” #rename layer
 
  ## 2017
  pollutant.2017 <- brick(file.path(input, “ACAG_PM25_V4GL03_201701_201712_0p05.nc”))
  pollutant.2017 <- raster::crop(pollutant.2017, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2017) <- “pm25_2017” #rename layer
 
  ## 2018
  pollutant.2018 <- brick(file.path(input, “ACAG_PM25_V4GL03_201801_201812_0p05.nc”))
  pollutant.2018 <- raster::crop(pollutant.2018, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2018) <- “pm25_2018” #rename layer
 
  ## 2019
  pollutant.2019 <- brick(file.path(input, “ACAG_PM25_V4GL03_201901_201912_0p05.nc”))
  pollutant.2019 <- raster::crop(pollutant.2019, c(-18.979003,51.084828, -35.044650,26.626175))
  names(pollutant.2019) <- “pm25_2019” #rename layer
 
  ## Extracting for Africa counties
  africashapefile.98 <- cbind(africashapefile1, raster::extract(pollutant.98, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)) # you could change method = ‘bilinear’ for an interpolation of the 4 closest grid cells when calculating Pm2.5 average. You could also add the option: weight= TRUE, for an area weighted average.
  africashapefile.99 <- raster::extract(pollutant.99, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
 
  africashapefile.00 <- raster::extract(pollutant.2000, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.01 <- raster::extract(pollutant.2001, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.02 <- raster::extract(pollutant.2002, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.03 <- raster::extract(pollutant.2003, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.04 <- raster::extract(pollutant.2004, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.05 <- raster::extract(pollutant.2005, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.06 <- raster::extract(pollutant.2006, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.07 <- raster::extract(pollutant.2007, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.08 <- raster::extract(pollutant.2008, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.09 <- raster::extract(pollutant.2009, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.10 <- raster::extract(pollutant.2010, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.11 <- raster::extract(pollutant.2011, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.12 <- raster::extract(pollutant.2012, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.13 <- raster::extract(pollutant.2013, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.14 <- raster::extract(pollutant.2014, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.15 <- raster::extract(pollutant.2015, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.16 <- raster::extract(pollutant.2016, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.17 <- raster::extract(pollutant.2017, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.18 <- raster::extract(pollutant.2018, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
  africashapefile.19 <- raster::extract(pollutant.2019, africashapefile1.sf,  fun=mean, method=”simple”, df=TRUE)
 
  africa1.pm2.5 <- cbind(africashapefile.98, africashapefile.99,
                        africashapefile.00, africashapefile.01,
                        africashapefile.02, africashapefile.03,
                        africashapefile.04, africashapefile.05,
                        africashapefile.06, africashapefile.07,
                        africashapefile.08, africashapefile.09,
                        africashapefile.10, africashapefile.11,
                        africashapefile.12, africashapefile.13,
                        africashapefile.14, africashapefile.15,
                        africashapefile.16, africashapefile.17,
                        africashapefile.18, africashapefile.19
 
                        )
 
 
  africa1.pm2.5 <- africa1.pm2.5%>%
    dplyr::select(-ID)
 
 
  # Pivot dataframe from wide to Long (not needed but I like to).
  africa1.pm2.5.long <-   pivot_longer(africa1.pm2.5,
                                  cols = starts_with(“pm25_”), # Select columns to reshape.. all columns starting with pm25
                                  names_sep = “_”, # Character that divides variable name and suffix
                                  names_to =  c(“.value”, “year”),
                                  values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
                                  values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
 
  # saving file
  write_dta(africa1.pm2.5.long, file.path(final, “Africa_DHS_adm2_yearly_PM25full_1998_2019.dta”), version=14)

Conclusion

With these datasets and codes, you could explore many pollution-related topics.

Check out my working paper on Air Pollution, Avoidance Behavior and Labor Market outcomes: Evidence from the United States.

Best of luck with your work! Oh I forgot, PLEASE share the post & don’t hesitate to reach out (email: djoum003@umn.edu) if there is something confusing or if you need help with any of this. I am happy to help whenever I can! Cheers!

ALSO READ   Food Waste & Food Loss Data (Free Download)

Berenger Djoumessi

Berenger is a Ph.D. graduate from the University of Minnesota in Applied Economics, where he studied topics at the intersection of environmental, development, and agricultural economics. Contact: djoum003@umn.edu

Recent Posts