HES 505 Fall 2023: Session 19
By the end of today you should be able to:
Distinguish deterministic and stochastic processes
Define autocorrelation and describe its estimation
Articulate the benefits and drawbacks of autocorrelation
Leverage point patterns and autocorrelation to interpolate missing data
A spatial process is a description of how a spatial pattern might be generated
Generative models
An observed pattern as a possible realization of an hypothesized process
\[ z = 2x + 3y \]
\[ z = 2x + 3y + d \]
Considering each outcome as the realization of a process allows us to generate expected values
The simplest spatial process is Completely Spatial Random (CSR) process
First Order effects: any event has an equal probability of occurring in a location
Second Order effects: the location of one event is independent of the other events
We can use quadrat counts to estimate the expected number of events in a given area
The probability of each possible count is given by:
\[ P(n,k) = {n \choose x}p^k(1-p)^{n-k} \]
\[ \begin{equation} P(k,n,x) = {n \choose k}\bigg(\frac{1}{x}\bigg)^k\bigg(\frac{x-1}{x}\bigg)^{n-k} \end{equation} \]
Nearest neighbor methods throw away a lot of information
If points have independent, fixed marginal densities, then they exhibit complete, spatial randomness (CSR)
The K function is an alternative, based on a series of circles with increasing radius
\[ \begin{equation} K(d) = \lambda^{-1}E(N_d) \end{equation} \]
\[ \begin{equation} K_{CSR}(d) = \pi d^2 \end{equation} \]
When working with a sample the distribution of \(K\) is unknown
Estimate with
\[ \begin{equation} \hat{K}(d) = \hat{\lambda}^{-1}\sum_{i=1}^n\sum_{j=1}^n\frac{I(d_{ij} <d)}{n(n-1)} \end{equation} \]
where:
\[ \begin{equation} \hat{\lambda} = \frac{n}{|A|} \end{equation} \]
spatstat
package\(L\) function: square root transformation of \(K\)
\(G\) function: the cummulative frequency distribution of the nearest neighbor distances
\(F\) function: similar to \(G\) but based on randomly located points
‘everything is usually related to all else but those which are near to each other are more related when compared to those that are further away’.
From Manuel Gimond
Use spdep
package
Estimate neighbors
Generate weighted average
set.seed(2354)
# Load the shapefile
s <- readRDS(url("https://github.com/mgimond/Data/raw/gh-pages/Exercises/fl_hr80.rds"))
# Define the neighbors (use queen case)
nb <- poly2nb(s, queen=TRUE)
# Compute the neighboring average homicide rates
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
#estimate Moran's I
moran.test(s$HR80,lw, alternative="greater")
Moran I test under randomisation
data: s$HR80
weights: lw
Moran I statistic standard deviate = 1.8891, p-value = 0.02944
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.136277593 -0.015151515 0.006425761
Spatial autocorrelation threatens second order randomness
Areal data means an infinite number of potential distances
Neighbor matrices, \(\boldsymbol W\), allow different characterizations
Goal: estimate the value of \(z\) at new points in \(\mathbf{x_i}\)
Most useful for continuous values
Nearest-neighbor, Inverse Distance Weighting, Kriging
find \(i\) such that \(| \mathbf{x_i} - \mathbf{x}|\) is minimized
The estimate of \(z\) is \(z_i\)
aq <- read_csv("data/ad_viz_plotval_data.csv") %>%
st_as_sf(., coords = c("SITE_LONGITUDE", "SITE_LATITUDE"), crs = "EPSG:4326") %>%
st_transform(., crs = "EPSG:8826") %>%
mutate(date = as_date(parse_datetime(Date, "%m/%d/%Y"))) %>%
filter(., date >= 2023-07-01) %>%
filter(., date > "2023-07-01" & date < "2023-07-31")
aq.sum <- aq %>%
group_by(., `Site Name`) %>%
summarise(., meanpm25 = mean(DAILY_AQI_VALUE))
nodes <- st_make_grid(aq.sum,
what = "centers")
dist <- distance(vect(nodes), vect(aq.sum))
nearest <- apply(dist, 1, function(x) which(x == min(x)))
aq.nn <- aq.sum$meanpm25[nearest]
preds <- st_as_sf(nodes)
preds$aq <- aq.nn
preds <- as(preds, "Spatial")
sp::gridded(preds) <- TRUE
preds.rast <- rast(preds)
\[ \begin{equation} \hat{z}(\mathbf{x}) = \frac{\sum_{i=1}w_iz_i}{\sum_{i=1}w_i} \end{equation} \] where
\[ \begin{equation} w_i = | \mathbf{x} - \mathbf{x}_i |^{-\alpha} \end{equation} \] and \(\alpha > 0\) (\(\alpha = 1\) is inverse; \(\alpha = 2\) is inverse square)
terra::interpolate
provides flexible interpolation methods
Use the gstat
package to develop the formula
mgsf05 <- gstat(id = "meanpm25", formula = meanpm25~1, data=aq.sum, nmax=7, set=list(idp = 0.5))
mgsf2 <- gstat(id = "meanpm25", formula = meanpm25~1, data=aq.sum, nmax=7, set=list(idp = 2))
interpolate_gstat <- function(model, x, crs, ...) {
v <- st_as_sf(x, coords=c("x", "y"), crs=crs)
p <- predict(model, v, ...)
as.data.frame(p)[,1:2]
}
zsf05 <- interpolate(preds.rast, mgsf05, debug.level=0, fun=interpolate_gstat, crs=crs(preds.rast), index=1)
zsf2 <- interpolate(preds.rast, mgsf2, debug.level=0, fun=interpolate_gstat, crs=crs(preds.rast), index=1)