HES 505 Fall 2023: Session 18
Define a point process and their utility for ecological applications
Define first and second-order Complete Spatial Randomness
Use several common functions to explore point patterns
Leverage point patterns to interpolate missing data
Point pattern: A set of events within a study region (i.e., a window) generated by a random process
Set: A collection of mathematical events
Events: The existence of a point object of the type we are interested in at a particular location in the study region
A marked point pattern refers to a point pattern where the events have additional descriptors
Some notation:
\(S\): refers to the entire set
\(\mathbf{s_i}\) denotes the vector of data describing point \(s_i\) in set \(S\)
\(\#(S \in A )\) refers to the number of points in \(S\) within study area \(A\)
The pattern must be mapped on a plane to preserve distance
The study area, \(A\), should be objectively determined
There should be a \(1:1\) correspondence between objects in \(A\) and events in the pattern
Events must be proper i.e., refer to actual locations of the event
For some analyses the pattern should be a census of the relevant events
Density-based metrics: the \(\#\) of points within area, \(a\), in study area \(A\)
Distance-based metrics: based on nearest neighbor distances or the distance matrix for all points
First order effects reflect variation in intensity due to variation in the ‘attractiveness’ of locations
Second order effects reflect variation in intensity due to the presence of points themselves
Modeling random processes means we are interested in probability densities of the points (first-order;density)
Also interested in how the presence of some events affects the probability of other events (second-order;distance)
Finally interested in how the attributes of an event affect location (marked)
Need to introduce a few new packages (spatstat
and gstat
)
\[ \begin{equation} \hat{\lambda} = \frac{\#(S \in A )}{a} \end{equation} \] * Local density = quadrat counts
\[ \begin{equation} \hat{f}(x) = \frac{1}{nh_xh_y} \sum_{i=1}^n k\bigg(\frac{{x-x_i}}{h_x},\frac{{y-y_i}}{h_y} \bigg) \end{equation} \]
Assume each location in \(\mathbf{s_i}\) drawn from unknown distribution
Distribution has probability density \(f(\mathbf{x})\)
Estimate \(f(\mathbf{x})\) by averaging probability “bumps” around each location
Need different object types for most operations in R
(as.ppp
)
\(h\) is the bandwidth and \(k\) is the kernel
We can use stats::density
to explore
kernel: defines the shape, size, and weight assigned to observations in the window
bandwidth often assigned based on distance from the window center
Small values for \(h\) give ‘spiky’ densities
Large values for \(h\) smooth much more
Some kernels have optimal bandwidth detection
tmap
package provides additional functionality
KDEs assume independence of points (first order randomness)
Second-order methods allow dependence amongst points (second-order randomness)
Several functions for assessing second order dependence (\(K\), \(L\), and \(G\))
Provide an estimate of the second order effects
Mean nearest-neighbor distance: \[\hat{d}_{min} = \frac{\sum_{i = 1}^{m} d_{min}(\mathbf{s_i})}{n}\]
Nearest neighbor methods throw away a lot of information
If points have independent, fixed marginal densities, then they exhibit complete, spatial randomness (CSR)
The K function is an alternative, based on a series of circles with increasing radius
\[ \begin{equation} K(d) = \lambda^{-1}E(N_d) \end{equation} \]
\[ \begin{equation} K_{CSR}(d) = \pi d^2 \end{equation} \]
When working with a sample the distribution of \(K\) is unknown
Estimate with
\[ \begin{equation} \hat{K}(d) = \hat{\lambda}^{-1}\sum_{i=1}^n\sum_{j=1}^n\frac{I(d_{ij} <d)}{n(n-1)} \end{equation} \]
where:
\[ \begin{equation} \hat{\lambda} = \frac{n}{|A|} \end{equation} \]
spatstat
package\(L\) function: square root transformation of \(K\)
\(G\) function: the cummulative frequency distribution of the nearest neighbor distances
\(F\) function: similar to \(G\) but based on randomly located points