HES 505 Fall 2023: Session 20
By the end of today you should be able to:
Describe and implement statistical approaches to interpolation
Describe the case for identifying neighbors with areal data
Implement contiguity-based neighborhood detection approaches
Implement graph-based neighborhood detection approaches
Previous methods predict \(z\) as a (weighted) function of distance
Treat the observations as perfect (no error)
If we imagine that \(z\) is the outcome of some spatial process such that:
Basically a regression on the coordinates of your data points
Coefficients apply to the coordinates and their interaction
Relies on different functional forms
Simplest form of trend surface
\(Z=a\) where \(a\) is the mean value of air quality
Result is a simple horizontal surface where all values are the same.
#set up interpolation grid
# Create an empty grid where n is the total number of cells
grd <- as.data.frame(spsample(as(id.cty, "Spatial"), "regular", n=20000))
names(grd) <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd) <- TRUE # Create SpatialPixel object
fullgrid(grd) <- TRUE # Create SpatialGrid object
proj4string(grd) <- proj4string(as(aq.sum, "Spatial"))
# Define the polynomial equation
f.0 <- as.formula(meanpm25 ~ 1)
# Run the regression model
lm.0 <- lm( f.0 , data=aq.sum)
# Use the regression model output to interpolate the surface
dat.0th <- SpatialGridDataFrame(grd, data.frame(var1.pred = predict(lm.0, newdata=grd)))
# Convert to raster object to take advantage of rasterVis' imaging
# environment
r <- rast(dat.0th)
r.m <- mask(r, st_as_sf(id.cty))
tm_shape(r.m) + tm_raster( title="Predicted air quality") +tm_shape(aq.sum) + tm_dots(size=0.2) +
tm_legend(legend.outside=TRUE)
Creates a slanted surface
\(Z = a + bX + cY\)
X and Y are the coordinate pairs
# Define the polynomial equation
f.1 <- as.formula(meanpm25 ~ X + Y)
aq.sum$X <- st_coordinates(aq.sum)[,1]
aq.sum$Y <- st_coordinates(aq.sum)[,2]
# Run the regression model
lm.1 <- lm( f.1 , data=aq.sum)
# Use the regression model output to interpolate the surface
dat.1st <- SpatialGridDataFrame(grd, data.frame(var1.pred = predict(lm.1, newdata=grd)))
# Convert to raster object to take advantage of rasterVis' imaging
# environment
r <- rast(dat.1st)
r.m <- mask(r, st_as_sf(id.cty))
Produces a parabolic surface
\(Z = a + bX + cY + dX^2 + eY^2 + fXY\)
Highlights the interaction of both directions
# Define the 1st order polynomial equation
f.2 <- as.formula(meanpm25 ~ X + Y + I(X*X)+I(Y*Y) + I(X*Y))
# Run the regression model
lm.2 <- lm( f.2, data=aq.sum)
# Use the regression model output to interpolate the surface
dat.2nd <- SpatialGridDataFrame(grd, data.frame(var1.pred = predict(lm.2, newdata=grd)))
r <- rast(dat.2nd)
r.m <- mask(r, st_as_sf(id.cty))
tm_shape(r.m) + tm_raster(n=10, palette="RdBu", title="Predicted air quality") +tm_shape(aq.sum) + tm_dots(size=0.2) +
tm_legend(legend.outside=TRUE)
Previous methods predict \(z\) as a (weighted) function of distance
Treat the observations as perfect (no error)
If we imagine that \(z\) is the outcome of some spatial process such that:
\[ \begin{equation} z(\mathbf{x}) = \mu(\mathbf{x}) + \epsilon(\mathbf{x}) \end{equation} \]
then any observed value of \(z\) is some function of the process (\(\mu(\mathbf{x})\)) and some error (\(\epsilon(\mathbf{x})\))
Correlation the tendency for two variables to be related
Autocorrelation the tendency for observations that are closer (in space or time) to be correlated
Positive autocorrelation neighboring observations have \(\epsilon\) with the same sign
Negative autocorrelation neighboring observations have \(\epsilon\) with a different sign (rare in geography)
\[ \begin{equation} z(\mathbf{x}) = \mu + \epsilon(\mathbf{x}) \end{equation} \]
Mean and variance need to be constant across study area
Trend surfaces indicate that is not the case
Need to remove that trend
nugget - the proportion of semivariance that occurs at small distances
sill - the maximum semivariance between pairs of observations
range - the distance at which the sill occurs
experimental vs. fitted variograms
var.cld <- gstat::variogram(res ~ 1, aq.sum, cloud = TRUE)
var.df <- as.data.frame(var.cld)
index1 <- which(with(var.df, left==21 & right==2))
OP <- par( mar=c(4,6,1,1))
plot(var.cld$dist/1000 , var.cld$gamma, col="grey",
xlab = "Distance between point pairs (km)",
ylab = expression( frac((res[2] - res[1])^2 , 2)) )
# Compute the sample experimental variogram
var.smpl <- gstat::variogram(f.2, aq.sum, cloud = FALSE)
bins.ct <- c(0, var.smpl$dist , max(var.cld$dist) )
bins <- vector()
for (i in 1: (length(bins.ct) - 1) ){
bins[i] <- mean(bins.ct[ seq(i,i+1, length.out=2)] )
}
bins[length(bins)] <- max(var.cld$dist)
var.bins <- findInterval(var.cld$dist, bins)
# Point data cloud with bin boundaries
OP <- par( mar = c(5,6,1,1))
plot(var.cld$gamma ~ eval(var.cld$dist/1000), col=rgb(0,0,0,0.2), pch=16, cex=0.7,
xlab = "Distance between point pairs (km)",
ylab = expression( gamma ) )
points( var.smpl$dist/1000, var.smpl$gamma, pch=21, col="black", bg="red", cex=1.3)
abline(v=bins/1000, col="red", lty=2)
var.smpl <- gstat::variogram(f.2, aq.sum, cloud = FALSE, cutoff = 1000000)
# Compute the variogram model by passing the nugget, sill and range values
# to fit.variogram() via the vgm() function.
dat.fit <- gstat::fit.variogram(var.smpl, gstat::vgm(nugget = 12, range= 60000, model="Gau", cutoff=1000000))
[using ordinary kriging]
[using universal kriging]
dat.krg <- gstat::krige( f.2, as(aq.sum, "Spatial"), grd, dat.fit)
r <- rast(dat.krg)$var1.pred
r.m <- mask(r, st_as_sf(id.cty))
# Plot the raster and the sampled points
tm_shape(r.m) + tm_raster(n=10, palette="RdBu", title="Predicted air quality") +tm_shape(aq.sum) + tm_dots(size=0.2) +
tm_legend(legend.outside=TRUE)
Assumes that the deterministic part of the process (\(\mu(\mathbf{x})\)) is now a function of the location \(\mathbf{x}\)
Could be the location or some other attribute
Now y
is a function of some aspect of x
relies on autocorrelation in \(\epsilon_1(\mathbf{x})\) for \(z_1\) AND cross correlation with other variables (\(z_{2...j}\))
Extending the ordinary kriging model gives:
\[ \begin{equation} z_1(\mathbf{x}) = \mu_1 + \epsilon_1(\mathbf{x})\\ z_2(\mathbf{x}) = \mu_2 + \epsilon_2(\mathbf{x}) \end{equation} \] * Note that there is autocorrelation within both \(z_1\) and \(z_2\) (because of the \(\epsilon\)) and cross-correlation (because of the location, \(\mathbf{x}\))
gstat
callsgCoK <- gstat(NULL, 'log.zinc', log(zinc)~1, meuse, locations=~x+y)
gCoK <- gstat(gCoK, 'elev', elev~1, meuse, locations=~x+y)
gCoK <- gstat(gCoK, 'cadmium', cadmium~1, meuse, locations=~x+y)
coV <- variogram(gCoK)
coV.fit <- fit.lmc(coV, gCoK, vgm(model='Sph', range=1000))
coK <- interpolate(r, coV.fit, debug.level=0)