
Want to download hourly, monthly, and annual pollution data? You’ve come to the right place!
Below I show you where to download pollution data from multiple sources and how to easily extract the value of pollutants (e.g., PM 2.5, PM 10, nitrogen, carbon monoxide, Ozone, etc.) for any area, region, or point coordinates you may want.
Before going any further, check out my working paper on Air Pollution, Avoidance Behavior, and Labor Market outcomes: Evidence from the United States, which shows how I use different pollution data below for the purpose.
Source 1: The CAMS Atmosphere Data Store (ADS)

- FREE
- Date range: 2003 – Present
- Coverage: Global
- 3-hourly (every 3 hours) & Monthly
- Resolution: 0.75 deg * 0.75 deg (approximately 80 km). Perhaps too big for you.
- Format: .grib or netCDF files
Other info: This dataset contains almost ALL possible atmospheric variables (PM2.5, PM1, PM10, ozone (O3), carbon monoxide (CO), nitrogen dioxide (NO2), Dust, Aerosols, Black Carbon, Methane, and so much more).
Click Here to Download (2003-Present).
Source 2: Global Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD) with GWR
- FREE
- Date range: 1998 – 2019
- Coverage: Global
- Annual Data
- Resolution: 0.01 degrees (approx. 1km sq for each grid)
- Format: .geotiff files
Click Here to Download (1998-2019).
About the Method
” Geographically Weighted Regression (GWR) is used with global ground-based measurements from the World Health Organization (WHO) database to predict and adjust for the residual PM2.5 bias per grid cell in the initial satellite-derived values. These estimates are primarily intended to aid in large-scale studies. Gridded data sets are provided at a resolution of 0.01 degrees to allow users to agglomerate data as best meets their particular needs. Data sets are gridded at the finest resolution of the information sources that were incorporated, but do not fully resolve PM2.5 gradients at the gridded resolution due to influence by information sources at coarser resolution. The data are distributed as GeoTIFF files and are in WGS84 projection.”
Source 3: Sentinel-5P OFFL AER AI: Offline UV Aerosol Index
- FREE
- Date range: 2018-07-04 to Present
- Coverage: Global
- Daily Data
- Resolution: 1113.2 meters
- Format: maybe .netcdf (I don’t know how to process Google Earth Engine data yet). You can watch this YouTube video below to learn more.
Other info: This is probably one of the lowest resolution data on pollution. The issue is the time period
This dataset provides offline high-resolution imagery of the UV Aerosol Index (UVAI), also called the Absorbing Aerosol Index (AAI).
When the AAI is positive, it indicates the presence of UV-absorbing aerosols like dust and smoke. It is useful for tracking the evolution of episodic aerosol plumes from dust outbreaks, volcanic ash, and biomass burning.
Click Here to Access the Sentinel Data
Source 4: WUSTL Pollution Data
Ground-level fine particulate matter (PM2.5). You can also derive Dust and Sea-salt from it too.
- FREE
- Date range: 1998-2020
- Coverage: Global
- annual and monthly
- Resolution: 0.01° × 0.01° degrees (approx. 1km sq for each grid); 0.05° × 0.05°
- Format: .netcdf (.nc)
Click Here to Download (1998-2019).
You can also get surface SO2 HERE.
Source 5: CACES Pollution Data
This dataset tracks six pollutants:
Four gases: O3, CO, SO2, NO2; two aerosols: PM10, PM2.5) throughout the contiguous U.S.
- FREE
- Coverage: United States only
- Date range: 1979 – 2015 (SO2, NO2), 1988 – 2015 (PM10), 1990-2015 (CO), 1999-2015 (PM2.5) and 1979-2015 (O3)
- annual and monthly
- Resolution: Data are available at national, state, county, census tract, and census block group levels
- Format: .excel
Click Here to Download (1998-2019).
Make sure you put your email and org name before proceeding.
Then select your desired pollutant.
Source 6: EPA Pollution Data (EPA = Environmental Protection Agency)
Time range: 1999 – Present
Pollutants covered: PM2.5, Pm 10, Ozone, Air Quality Index (AQI), Ozone etc.
Type: Daily, Monthly
Coverage: United States only
Data format: Excel files
There are two ways to download the EPA:
Source 1: Pregenerated files to download directly, daily average by county
Below is what the page looks like:

The data in each excel will cover all monitors across the whole U.S.
Source 2: EPA Query tool
Below is a screenshot of how you can download it:
TIP: I would suggest downloading the excel files if your study covers the U.S. and many years, or using the query tool if you just want data for a state or specific monitoring sites. I used the excel files for my study on Air Pollution, Avoidance Behavior, and Labor Market outcomes: Evidence from the United States. Check it out. 😉
R Codes to Extract Pollutants from Different Datasets
The codes below only apply to the spatial datasets and not the EPA data.
Example Code to Extract the CAMS ADS Data for China counties
You just need to change your shapefile for your country (at any admin level) to process yours as well.
This is what the final dataset will look like:
The final values are the county-level average of pollutants by month & year.
#GOAL: Get and process monthly Pollutants data for CHINA Counties from 2010-2020 from CAMS Satellite data
# Data source: https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-global-reanalysis-eac4?tab=form
rm(list=ls())
library(sp)
library(raster)
library(spatstat)
library(rgdal)
library(maptools)
library(tmap)
library(tmaptools)
library(ncdf4)
library(leaflet)
library(dplyr)
library(plyr)
library(sf)
library(tidyr)
library(data.table)
library(haven)
library(ncdf4)
library(foreign)
library(ggpubr)
library(pacman) # to use pivot to reshape
gc()
setwd (“C:/Users/dbere/Dropbox/COLLABS/subsidy, crop residue burn, air quality/inputdata/pollution”)
########################################################
# Processing PM 2.5 Satellite CAMS data on same days
#######################################################
china <- raster::getData(name=”GADM”, country=”CHN”, level=3)
china <- as(china, “sf”)
chinamap <- st_read(“C:/Users/dbere/Dropbox/COLLABS/subsidy, crop residue burn, air quality/sample counties china/sample_counties1.shp”)
# listing files in a vector
all.pollution.files<- list.files( “.”, pattern = “.*.nc$”, recursive = TRUE, full.names = FALSE) # we are not loading full path because we will use each file name in a loop
########################################################
# Processing PM 10 Satellite CAMS data Monthly county-level
#######################################################
pollutant <- brick(all.pollution.files[1])
pollutant
## Extracting for US Counties monthly
chn.monthly.pol.extract <- cbind(chinamap, raster::extract(pollutant, chinamap, fun=mean, weight=TRUE, method=”simple”, df=TRUE))
chn.monthly.pol.extract$ID <- NULL # Removing the ID column
###############
# Process for all pollutants at ONCE
for (i in 2:length(all.pollution.files)) {
newfile <- brick(all.pollution.files[i]) # this will be the second file in first iteration and so on
#Extracting pollutant file for china map
extractpol <- cbind(chinamap, raster::extract(newfile, chinamap,
fun=mean, weight=TRUE, method=”simple”, df=TRUE))
extractpol$ID <- NULL # removing column ID
j<- gsub(“_2009_2020.nc”, “”, all.pollution.files[i], fixed = TRUE) # the name of the file in “i” is assigned to object j without 2009_2020.nc
assign(j, extractpol) #this assign the name extracted from i to the new object or data table extractpol
gc()
}
#################
# Fixing variable names
# Removing unnecessary characters in variable names ( currently vars have the form X2010.01) = transform it to pm10_2010_01_01 so that it can be used in other programs (Stata)
#—————–
# Exporting Data For PM 10
names(chn.monthly.pol.extract) <-gsub(“\\X”,”pm10_”,names(chn.monthly.pol.extract))
names(chn.monthly.pol.extract) <-gsub(“\\.”,””,names(chn.monthly.pol.extract))
chn.monthly.pol.extract.long <- pivot_longer(st_drop_geometry(chn.monthly.pol.extract),
cols = starts_with(“pm10_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(chn.monthly.pol.extract.long, “./sat_pm10_counties_monthly_china_long_2009_2020.dta”, version=14)
#—————–
# Exporting Data For PM2.5
names(CAMS_monthly_avg_pm2_5_CHINA) <-gsub(“\\X”,”pm25_”,names(CAMS_monthly_avg_pm2_5_CHINA))
names(CAMS_monthly_avg_pm2_5_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_pm2_5_CHINA))
CAMS_monthly_avg_pm2_5_CHINA.long <- pivot_longer(st_drop_geometry(CAMS_monthly_avg_pm2_5_CHINA),
cols = starts_with(“pm25_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(CAMS_monthly_avg_pm2_5_CHINA.long, “./sat_pm25_counties_monthly_china_long_2009_2020.dta”, version=14)
#—————–
# Exporting Data For Aerosols
names(CAMS_monthly_avg_total_aerosols_550nm_CHINA) <-gsub(“\\X”,”aerosols_”,names(CAMS_monthly_avg_total_aerosols_550nm_CHINA))
names(CAMS_monthly_avg_total_aerosols_550nm_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_aerosols_550nm_CHINA))
CAMS_monthly_avg_total_aerosols_550nm_CHINA.long <- pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_aerosols_550nm_CHINA),
cols = starts_with(“aerosols_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(CAMS_monthly_avg_total_aerosols_550nm_CHINA.long, “./sat_aerosols_counties_monthly_china_long_2009_2020.dta”, version=14)
#—————–
# Exporting Data For Carbone Monoxide
names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA) <-gsub(“\\X”,”carbmonoxide_”,names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA))
names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA))
CAMS_monthly_avg_total_column_carbon_monoxide_CHINA.long <- pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA),
cols = starts_with(“carbmonoxide_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(CAMS_monthly_avg_total_column_carbon_monoxide_CHINA.long, “./sat_carbmonoxide_counties_monthly_china_long_2009_2020.dta”, version=14)
#—————–
# Exporting Data For Nitrogen
names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA) <-gsub(“\\X”,”nitrogendioxide_”,names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA))
names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA))
CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA.long <- pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA),
cols = starts_with(“nitrogendioxide_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA.long, “./sat_nitrogendioxide_counties_monthly_china_long_2009_2020.dta”, version=14)
#—————–
# Exporting Data For Nitrogen
names(CAMS_monthly_avg_total_column_ozone_CHINA) <-gsub(“\\X”,”ozone_”,names(CAMS_monthly_avg_total_column_ozone_CHINA))
names(CAMS_monthly_avg_total_column_ozone_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_ozone_CHINA))
CAMS_monthly_avg_total_column_ozone_CHINA.long <- pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_ozone_CHINA),
cols = starts_with(“ozone_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(CAMS_monthly_avg_total_column_ozone_CHINA.long, “./sat_ozone_counties_monthly_china_long_2009_2020.dta”, version=14)
#—————–
# Exporting Data For Sulf Dioxide
names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA) <-gsub(“\\X”,”sulfdioxiode_”,names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA))
names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA) <-gsub(“\\.”,””,names(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA))
CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA.long <- pivot_longer(st_drop_geometry(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA),
cols = starts_with(“sulfdioxiode_”), # Select columns to reshape
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “yearmonth”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
write_dta(CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA.long, “./sat_sulfdioxiode_counties_monthly_china_long_2009_2020.dta”, version=14)
###################################
## APPEND MULTIPLE FILES AT ONCE
##################################
allfiles <- cbind(chn.monthly.pol.extract.long, CAMS_monthly_avg_pm2_5_CHINA.long[,12], CAMS_monthly_avg_total_aerosols_550nm_CHINA.long[,12],
CAMS_monthly_avg_total_column_carbon_monoxide_CHINA.long[,12], CAMS_monthly_avg_total_column_nitrogen_dioxide_CHINA.long[,12],
CAMS_monthly_avg_total_column_ozone_CHINA.long[,12], CAMS_monthly_avg_total_column_sulphur_dioxide_CHINA.long[,12])
# Multiplying values by the corresponding multipler: PM pollutants have to be multiplied by 1 billion to have regular known values
# All units are described here at bottom of: https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-global-reanalysis-eac4?tab=overview
allfiles <- allfiles%>%
mutate(pm10 = pm10* 1000000000, pm25 = pm25* 1000000000, # from kg/m3 to mg/m3 is 1 billion factor
)
write_dta(allfiles, “./all_pollutants_counties_monthly_china_long_2009_2020.dta”, version=14
Example Code to Extract the WUSTL PM2.5 Data for Africa Admin 3 shapefiles
Make sure you change all the local paths and adapt to your context.
#GOAL: Get Daily Pollutant data from 1998-2019 from CAMS Satellite data
# Data source: https://sites.wustl.edu/acag/datasets/surface-pm2-5/#V5.GL.02
rm(list=ls())
library(sp)
library(raster)
library(spatstat)
library(rgdal)
library(maptools)
library(tmap)
library(tmaptools)
library(ncdf4)
library(leaflet)
library(dplyr)
library(plyr)
library(sf)
library(tidyr)
library(data.table)
library(haven)
library(ncdf4)
library(foreign)
library(ggpubr)
library(mapview)
library(pacman) # to use pivot to reshape
library(stars)
library(rgdal)
gc()
final <-“C:/Documents/Data/otherdata/WUSTL_pollution_data_annual/WUSTL_DHS_Africa_pm25”
########################################################
# Processing PM 2.5 Satellite WUSTL yearly data
#######################################################
input <-“./Data/otherdata/WUSTL_pollution_data_annual/PM25_full_annual/0.05 degrees”
# # DHS Africa Map
africashapefile.sf<- st_read(“C:/Users/dbere/Documents/DATA/DHS DATA/Africa shapefiles/africa_shape_dhs_countries.shp”)
africashapefile.sf # this allows to get extent of boundary box containing almost all African countries. I constructed this map myself to use in different situations. Use your own map. Check the GADM website to download country maps/shapefiles.
#Selecting some countries
africashapefile1.sf <- africashapefile.sf%>%
dplyr::filter(grepl(‘Angola|Benin|Burkina Faso|Burundi|Cameroon|Chad|Congo Democratic Republic|Ivoire|Ethiopia|Ghana|Guinea|Kenya’, NAME_0)) # forma large map I constructed, I am just selecting some countries I want to extract for
africashapefile1 <- st_drop_geometry(africashapefile1.sf) # dropping column geometry in this clone dataframe. Will be used later.
levels(as.factor(africashapefile1.sf$NAME_0)) # checking the country names
africashapefile1 <- africashapefile1 %>% #selecting important columns in dataframe
dplyr::select(GID_0, NAME_0, GID_1, NAME_1, GID_2, NAME_2, GID_3, NAME_3, GID_4, NAME_4)
africashapefile1.sf <- africashapefile1.sf %>% # selecting only geometry column
dplyr::select(NL_NAME_1)
########################################################
# Processing PM 2.5 WUSTL
#######################################################
st_bbox(africashapefile1.sf)
# Open and crop to Africa extent
## 1998
pollutant.98 <- brick(file.path(input, “ACAG_PM25_V4GL03_199801_199812_0p05.nc”))
pollutant.98
pollutant.98 <- raster::crop(pollutant.98, c(-18.979003,51.084828, -35.044650,26.626175))
#mapview(pollutant.98[[1]])
names(pollutant.98) <- “pm25_1998” #rename layer
## 1999
pollutant.99 <- brick(file.path(input, “ACAG_PM25_V4GL03_199901_199912_0p05.nc”))
pollutant.99 <- raster::crop(pollutant.99, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.99) <- “pm25_1999” #rename layer
## 2000
pollutant.2000 <- brick(file.path(input, “ACAG_PM25_V4GL03_200001_200012_0p05.nc”))
pollutant.2000 <- raster::crop(pollutant.2000, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2000) <- “pm25_2000” #rename layer
## 2001
pollutant.2001 <- brick(file.path(input, “ACAG_PM25_V4GL03_200101_200112_0p05.nc”))
pollutant.2001 <- raster::crop(pollutant.2001, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2001) <- “pm25_2001” #rename layer
## 2002
pollutant.2002 <- brick(file.path(input, “ACAG_PM25_V4GL03_200201_200212_0p05.nc”))
pollutant.2002 <- raster::crop(pollutant.2002, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2002) <- “pm25_2002” #rename layer
## 2003
pollutant.2003 <- brick(file.path(input, “ACAG_PM25_V4GL03_200301_200312_0p05.nc”))
pollutant.2003 <- raster::crop(pollutant.2003, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2003) <- “pm25_2003” #rename layer
## 2004
pollutant.2004 <- brick(file.path(input, “ACAG_PM25_V4GL03_200401_200412_0p05.nc”))
pollutant.2004 <- raster::crop(pollutant.2004, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2004) <- “pm25_2004” #rename layer
## 2005
pollutant.2005 <- brick(file.path(input, “ACAG_PM25_V4GL03_200501_200512_0p05.nc”))
pollutant.2005 <- raster::crop(pollutant.2005, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2005) <- “pm25_2005” #rename layer
## 2006
pollutant.2006 <- brick(file.path(input, “ACAG_PM25_V4GL03_200601_200612_0p05.nc”))
pollutant.2006 <- raster::crop(pollutant.2006, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2006) <- “pm25_2006” #rename layer
## 2007
pollutant.2007 <- brick(file.path(input, “ACAG_PM25_V4GL03_200701_200712_0p05.nc”))
pollutant.2007 <- raster::crop(pollutant.2007, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2007) <- “pm25_2007” #rename layer
## 2008
pollutant.2008 <- brick(file.path(input, “ACAG_PM25_V4GL03_200801_200812_0p05.nc”))
pollutant.2008 <- raster::crop(pollutant.2008, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2008) <- “pm25_2008” #rename layer
## 2009
pollutant.2009 <- brick(file.path(input, “ACAG_PM25_V4GL03_200901_200912_0p05.nc”))
pollutant.2009 <- raster::crop(pollutant.2009, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2009) <- “pm25_2009” #rename layer
## 2010
pollutant.2010 <- brick(file.path(input, “ACAG_PM25_V4GL03_201001_201012_0p05.nc”))
pollutant.2010 <- raster::crop(pollutant.2010, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2010) <- “pm25_2010” #rename layer
## 2011
pollutant.2011 <- brick(file.path(input, “ACAG_PM25_V4GL03_201101_201112_0p05.nc”))
pollutant.2011 <- raster::crop(pollutant.2011, c(-18.979003,51.084828, -35.044650,26.626175))
mapview(pollutant.2011[[1]])
names(pollutant.2011) <- “pm25_2011” #rename layer
## 2012
pollutant.2012 <- brick(file.path(input, “ACAG_PM25_V4GL03_201201_201212_0p05.nc”))
pollutant.2012 <- raster::crop(pollutant.2012, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2012) <- “pm25_2012” #rename layer
## 2013
pollutant.2013 <- brick(file.path(input, “ACAG_PM25_V4GL03_201301_201312_0p05.nc”))
pollutant.2013 <- raster::crop(pollutant.2013, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2013) <- “pm25_2013” #rename layer
## 2014
pollutant.2014 <- brick(file.path(input, “ACAG_PM25_V4GL03_201401_201412_0p05.nc”))
pollutant.2014 <- raster::crop(pollutant.2014, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2014) <- “pm25_2014” #rename layer
## 2015
pollutant.2015 <- brick(file.path(input, “ACAG_PM25_V4GL03_201501_201512_0p05.nc”))
pollutant.2015 <- raster::crop(pollutant.2015, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2015) <- “pm25_2015” #rename layer
## 2016
pollutant.2016 <- brick(file.path(input, “ACAG_PM25_V4GL03_201601_201612_0p05.nc”))
pollutant.2016 <- raster::crop(pollutant.2016, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2016) <- “pm25_2016” #rename layer
## 2017
pollutant.2017 <- brick(file.path(input, “ACAG_PM25_V4GL03_201701_201712_0p05.nc”))
pollutant.2017 <- raster::crop(pollutant.2017, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2017) <- “pm25_2017” #rename layer
## 2018
pollutant.2018 <- brick(file.path(input, “ACAG_PM25_V4GL03_201801_201812_0p05.nc”))
pollutant.2018 <- raster::crop(pollutant.2018, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2018) <- “pm25_2018” #rename layer
## 2019
pollutant.2019 <- brick(file.path(input, “ACAG_PM25_V4GL03_201901_201912_0p05.nc”))
pollutant.2019 <- raster::crop(pollutant.2019, c(-18.979003,51.084828, -35.044650,26.626175))
names(pollutant.2019) <- “pm25_2019” #rename layer
## Extracting for Africa counties
africashapefile.98 <- cbind(africashapefile1, raster::extract(pollutant.98, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)) # you could change method = ‘bilinear’ for an interpolation of the 4 closest grid cells when calculating Pm2.5 average. You could also add the option: weight= TRUE, for an area weighted average.
africashapefile.99 <- raster::extract(pollutant.99, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.00 <- raster::extract(pollutant.2000, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.01 <- raster::extract(pollutant.2001, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.02 <- raster::extract(pollutant.2002, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.03 <- raster::extract(pollutant.2003, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.04 <- raster::extract(pollutant.2004, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.05 <- raster::extract(pollutant.2005, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.06 <- raster::extract(pollutant.2006, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.07 <- raster::extract(pollutant.2007, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.08 <- raster::extract(pollutant.2008, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.09 <- raster::extract(pollutant.2009, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.10 <- raster::extract(pollutant.2010, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.11 <- raster::extract(pollutant.2011, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.12 <- raster::extract(pollutant.2012, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.13 <- raster::extract(pollutant.2013, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.14 <- raster::extract(pollutant.2014, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.15 <- raster::extract(pollutant.2015, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.16 <- raster::extract(pollutant.2016, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.17 <- raster::extract(pollutant.2017, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.18 <- raster::extract(pollutant.2018, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africashapefile.19 <- raster::extract(pollutant.2019, africashapefile1.sf, fun=mean, method=”simple”, df=TRUE)
africa1.pm2.5 <- cbind(africashapefile.98, africashapefile.99,
africashapefile.00, africashapefile.01,
africashapefile.02, africashapefile.03,
africashapefile.04, africashapefile.05,
africashapefile.06, africashapefile.07,
africashapefile.08, africashapefile.09,
africashapefile.10, africashapefile.11,
africashapefile.12, africashapefile.13,
africashapefile.14, africashapefile.15,
africashapefile.16, africashapefile.17,
africashapefile.18, africashapefile.19
)
africa1.pm2.5 <- africa1.pm2.5%>%
dplyr::select(-ID)
# Pivot dataframe from wide to Long (not needed but I like to).
africa1.pm2.5.long <- pivot_longer(africa1.pm2.5,
cols = starts_with(“pm25_”), # Select columns to reshape.. all columns starting with pm25
names_sep = “_”, # Character that divides variable name and suffix
names_to = c(“.value”, “year”),
values_to = “.value”, # New columns to be created (“.value” will be calculated from the parsing of names_sep)
values_drop_na = FALSE) # Only use this line=TRUE if you are sure all observations here are NULL
# saving file
write_dta(africa1.pm2.5.long, file.path(final, “Africa_DHS_adm2_yearly_PM25full_1998_2019.dta”), version=14)
Conclusion
With these datasets and codes, you could explore many pollution-related topics.
Check out my working paper on Air Pollution, Avoidance Behavior and Labor Market outcomes: Evidence from the United States.
Best of luck with your work! Oh I forgot, PLEASE share the post & don’t hesitate to reach out (email: djoum003@umn.edu) if there is something confusing or if you need help with any of this. I am happy to help whenever I can! Cheers!