Point Pattern Analysis

HES 505 Fall 2023: Session 18

Matt Williamson

Objectives

  • Define a point process and their utility for ecological applications

  • Define first and second-order Complete Spatial Randomness

  • Use several common functions to explore point patterns

  • Leverage point patterns to interpolate missing data

What is a point pattern?

  • Point pattern: A set of events within a study region (i.e., a window) generated by a random process

  • Set: A collection of mathematical events

  • Events: The existence of a point object of the type we are interested in at a particular location in the study region

  • A marked point pattern refers to a point pattern where the events have additional descriptors

Some notation:

  • \(S\): refers to the entire set

  • \(\mathbf{s_i}\) denotes the vector of data describing point \(s_i\) in set \(S\)

  • \(\#(S \in A )\) refers to the number of points in \(S\) within study area \(A\)

Requirements for a set to be considered a point pattern

  • The pattern must be mapped on a plane to preserve distance

  • The study area, \(A\), should be objectively determined

  • There should be a \(1:1\) correspondence between objects in \(A\) and events in the pattern

  • Events must be proper i.e., refer to actual locations of the event

  • For some analyses the pattern should be a census of the relevant events

Describing Point Patterns

  • Density-based metrics: the \(\#\) of points within area, \(a\), in study area \(A\)

  • Distance-based metrics: based on nearest neighbor distances or the distance matrix for all points

  • First order effects reflect variation in intensity due to variation in the ‘attractiveness’ of locations

  • Second order effects reflect variation in intensity due to the presence of points themselves

from Manuel Gimond

Centrography

  • Mean center: the point, \(\hat{\mathbf{s}}\), whose coordinates are the average of all events in the pattern
  • Standard distance: a measure of the dispersion of points around the mean center
  • Standard ellipse: dispersion in one dimension

From Manuel Gimond

Analyzing Point Patterns

  • Modeling random processes means we are interested in probability densities of the points (first-order;density)

  • Also interested in how the presence of some events affects the probability of other events (second-order;distance)

  • Finally interested in how the attributes of an event affect location (marked)

  • Need to introduce a few new packages (spatstat and gstat)

Density based methods

  • The overall intensity of a point pattern is a crude density estimate

\[ \begin{equation} \hat{\lambda} = \frac{\#(S \in A )}{a} \end{equation} \] * Local density = quadrat counts

Analyzing Point Patterns

Kernel Density Estimates (KDE)

\[ \begin{equation} \hat{f}(x) = \frac{1}{nh_xh_y} \sum_{i=1}^n k\bigg(\frac{{x-x_i}}{h_x},\frac{{y-y_i}}{h_y} \bigg) \end{equation} \]

  • Assume each location in \(\mathbf{s_i}\) drawn from unknown distribution

  • Distribution has probability density \(f(\mathbf{x})\)

  • Estimate \(f(\mathbf{x})\) by averaging probability “bumps” around each location

  • Need different object types for most operations in R (as.ppp)

Kernel Density Estimates (KDE)

  • \(h\) is the bandwidth and \(k\) is the kernel

  • We can use stats::density to explore

  • kernel: defines the shape, size, and weight assigned to observations in the window

  • bandwidth often assigned based on distance from the window center

x <- rpoispp(lambda =50)
K1 <- density(x, bw=2)
K2 <- density(x, bw=10)
K3 <- density(x, bw=2, kernel="disc")

Choosing bandwidths and kernels

  • Small values for \(h\) give ‘spiky’ densities

  • Large values for \(h\) smooth much more

  • Some kernels have optimal bandwidth detection

  • tmap package provides additional functionality

Second-Order Analysis

Second-Order Analysis

  • KDEs assume independence of points (first order randomness)

  • Second-order methods allow dependence amongst points (second-order randomness)

  • Several functions for assessing second order dependence (\(K\), \(L\), and \(G\))

Distance based metrics

  • Provide an estimate of the second order effects

  • Mean nearest-neighbor distance: \[\hat{d}_{min} = \frac{\sum_{i = 1}^{m} d_{min}(\mathbf{s_i})}{n}\]

Nearest-neighbor distance

ANN <- apply(nndist(x, k=1:50),2,FUN=mean)
plot(ANN ~ eval(1:50), type="b", main=NULL, las=1)

Ripley’s \(K\) Function

  • Nearest neighbor methods throw away a lot of information

  • If points have independent, fixed marginal densities, then they exhibit complete, spatial randomness (CSR)

  • The K function is an alternative, based on a series of circles with increasing radius

\[ \begin{equation} K(d) = \lambda^{-1}E(N_d) \end{equation} \]

  • We can test for clustering by comparing to the expectation:

\[ \begin{equation} K_{CSR}(d) = \pi d^2 \end{equation} \]

  • if \(k(d) > K_{CSR}(d)\) then there is clustering at the scale defined by \(d\)

Ripley’s \(K\) Function

  • When working with a sample the distribution of \(K\) is unknown

  • Estimate with

\[ \begin{equation} \hat{K}(d) = \hat{\lambda}^{-1}\sum_{i=1}^n\sum_{j=1}^n\frac{I(d_{ij} <d)}{n(n-1)} \end{equation} \]

where:

\[ \begin{equation} \hat{\lambda} = \frac{n}{|A|} \end{equation} \]

Ripley’s \(K\) Function

  • Using the spatstat package

Ripley’s \(K\) Function

kf <- Kest(bramblecanes, correction-"border")
plot(kf)

Ripley’s \(K\) Function

  • accounting for variation in \(d\)
kf.env <- envelope(bramblecanes, correction="border", envelope = FALSE, verbose = FALSE)
plot(kf.env)

Other functions

  • \(L\) function: square root transformation of \(K\)

  • \(G\) function: the cummulative frequency distribution of the nearest neighbor distances

  • \(F\) function: similar to \(G\) but based on randomly located points