When working on research papers, an important decision often arises: “Whether and how to cluster the standard errors.” Abadie, Athey, Imbens, and Wooldridge wrote a recent paper on “When Should You Adjust Standard Errors for Clustering”.

The two reasons are sampling design (i.e., if the sample was drawn from the population using clustered sampling) and experimental design (i.e., a cluster of units are assigned a treatment).

In practice, it is common to cluster standard errors at the level of the treatment. For example, if the treatment is at the village level or state level, we often cluster errors at that level.

When working spatial or remote sensing data (e.g., weather, internet, crime), there is often another problem that arises requiring clustering standard errors differently than the traditional way: the issue of spatial dependence in units, which means nearby units are more similar than farther ones or the present values of the spatial units reflect characteristics of the same units in a more recent or distant past.

Today I discuss how to spatially cluster standard errors in Stata following Conley (1999) and the challenges encountered in an instrumental variable (IV) context.

## The issue of Spatial Dependence

If you visually explore (i.e., heatmap) data on variables such as crime, income, crop production, population density, rainfall, or internet connectivity, you might notice clustering in the values of administrative units.

**Figure 1**: Spatial Distribution of Terrorist Attacks after 2006 in Part of India

Source: Fetzer (2014)

It raises an issue of spatial dependence in the units, which violates the important assumption of error terms being independently and identically distributed (i.i.d).

This leads to a reduction in the number of i.i.d observations, which may inflate the t-stats and distort the significance levels.

Donaldson and Storeygard (2016) point out that a distinction is important if the issue is on the outcome variable or the independent variable of interest.

The violation of the i.i.d assumption occurs when spatial dependence is on the dependent variable.

A paper by Kelly Morgan in 2019 examines 27 papers in four top journals (*The American* *Economic Review, Econometrica, The Journal of Political Economy, and The Quarterly* *Journal of Economics*) looking at persistence regressions to examine the degree of spatial correlation.

The author finds that when replacing the outcome with spatial noise, only around one-quarter of the studies still have robust results.

I have observed that few papers tend to implement these spatially adjusted standard errors even when their outcomes tend to display patterns of spatial dependence.

**The advice**: run a Moran’s test to first assess the presence of spatial clustering of error terms and then spatially cluster the standard errors following Conley (1999) if you reject the null hypothesis of no spatial correlation in the residuals.

The next question is: “*How do I implement these Conley standard errors?*”. Let’s discuss how to do it in Stata.

## How to Spatially Cluster the Standard Errors at Different Cutoff Distance in Stata

One way studies often account for spatial dependence is by calculating the adjusted standard errors using the procedure developed by Timothy Conley (1999), which relies on a Generalized Method of Moments approach and computes a variance-covariance matrix with spatial weights.

These standard errors account for heteroscedasticity, unit-specific serial correlation, and cross-sectional spatial correlation.

Thanks to the incredible work of Solomon Hsiang (2010), and Thiemo Fetzer (2014), there is a way to implement these spatially adjusted errors in Stata.

### In a Linear Regression Model

Consider a (hypothetical) shock such as a hurricane or a flood affecting part of a country and we are interested to identify its impact on crop production or crime.

Suppose you have a 5-year panel data but the shock occurred after the third wave. Our difference-in-difference (DiD) specification could be:

*Y = a + b*treat + c*treat*post + d*X + e,*

where d represents the impact of the shock on our outcome Y, and e represents the error terms. Treat is a dummy indicating that the village was flooded, and 0, otherwise. Post variable takes the value of 1 for waves 4 and 5, and 0, otherwise.

If the treatment is constructed at the village level, for example, it is common in the literature to cluster standard errors at that level. In Stata, it would be:

*reghdfe outcome treatment post post#treatment controls, cluster(village)*

However, a shock such as flooding is not always restricted within the village’s administrative boundaries, and also a heatmap of crop production may show a local correlation among nearby units.

While we may find a significant negative impact of the shock on crop production while clustering at the village level, it is likely that calculating spatially adjusted standard errors at higher cutoff radiuses than the village leads to a weak to no impact.

Suppose we have the centroids or geocoordinates of the villages in question, the Conley adjusted standard errors can be implemented in Stata as follows:

- Using Hsiang code, a command in Stata would be:

*ols_spatial_HAC outcome **treatment post post#treatment controls**, lat(village latitude) lon(village longitude) timevar(year) panelvar(villageID) dist(300) lag(2)*

In the specification above, I assume a cutoff radius of 300 kilometers (i.e., spatial correlation between the error terms of two units is zero at 300km), and the temporal autocorrelation is set to 2 time periods.

One issue with Hsiang code is that Stata starts to become slow when adding many controls.

- Fetzer improved upon the code, which relies on the popular
**reghdfe**command to create the wrapper command “reg2hdfespatial”, which also allows for high dimensional fixed effects.

An example is:

*reg2hdfespatial outcome **treatment post post#treatment controls**, lat(village latitude) lon(village longitude) timevar(year) panelvar(villageID) dist(300) lag(2)*

### In an Instrumental Variable Specification

During my time as a World Bank Robert McNamara fellow, one of the research projects I worked on relied on the instrumental variable strategy for identification. The issue is that the instrument also showed patterns of spatial dependence.

Both Hsiang and Fetzer codes developed do not apply to the IV setting.

Tim Foreman developed an adapted version of the Hsiang and Fetzer codes to the IV setting called “*spatial_hac_iv*”.

An example would be:

*spatial_hac_iv outcome controls (treatment = instrument), id(villageID) time(year) latitude (villagelat) longitude(villagelon) dist(300) lag(2)*

An alternative to this command is the acreg command. An example is:

*acreg outcome controls (treatment = instrument), id(villageID) time(year) latitude (villagelat) longitude(villagelon) dist(300) lag(2) spatial hac pfe1(villageID) pfe2(year)*

Note: pfe1 pfe2 allows to partial out village and year fixed effects.

## Challenges with the IV Setting

The main challenge I encountered with this adapted code is that it is very slow to run as the number of observations increases.

For example, on a sample of 1,000 obs, it took about 1 minute to produce results, on a sample of 5,000 obs, it took 30 minutes, on a sample of 100,000 obs, my relatively “powerful” laptop crashed after days of running.

## What is the Appropriate Radius Cutoff ?

Another question could be “*What is the recommended cutoff distance I should use when implementing these spatially adjusted standard errors?*”. There does not seem to be a consensus in the literature about the exact number and different papers have used different cutoffs. For example, Nunn and Wantchekon (2011) examine how historical slave trade is related to the level of trust in African countries, which may affect their current economic development.

They use a cutoff of about 3 degrees for Africa. The authors find similar standard errors when compared to the traditional clustered standard errors at the ethnicity level, or the clustered standard errors at both the ethnic group and district levels. Hsiang (2010) uses a cutoff distance of 300 km.

**My suggestion**: implement different cutoffs in the robustness tests to show at what cutoff distance your significance holds and at what cutoff distance the results start becoming weak or insignificant as shown below.

### Other interesting resources on the topic of spatial clustering of SEs

Randomly Drawn Equators? – by FLORENCE KONDYLIS & JOHN LOESER

Clustering, Spatial Correlations and Randomization Inference – by Barriosm et al (2010)

Introduction to conleyreg – by Christian Düben