Integrating Rasters and Vector Data

HES 505 Fall 2023: Session 16

Matt Williamson

Today’s Plan

Objectives

Use dplyr with predicates and measures to subset and manipulate data
Use extract to access raster data
Use zonal to summarize access data
Join data into a single analyzable dataframe

Motivating Question

How do Collaborative Forest Landscape Restoration projects compare to other National Forest lands with respect to social and wildfire risks?

Thinking about the data

Datasets - Forest Service Boundaries, CFLRP Boundaries, Wildfire Risk Raster, CEJST shapefile
Dependent Variable - CFLRP (T or F)
Independent Variables - Wildfire hazard, income, education, housing burdent

Building some Pseudocode

1. Load Libraries
2. Load data
3. Check validity and alignment
4. Subset to relevant geographies
5. Select relevant attributes
6. Extract wildfire risk
7. CFLRP T or F

Load libraries

library(sf)
library(terra)
library(tidyverse)
library(tmap)

Load the data

Downloading USFS data using tempfiles and unzip

### FS Boundaries
tmp <- tempfile()
fs.url <- "https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.AdministrativeForest.zip"
download.file(fs.url, tmp)
tmp2 <- tempfile()
unzip(zipfile=tmp, exdir = tmp2 )

fs.bdry <- read_sf(tmp2)

### CFLRP Data
tmp <- tempfile()
cflrp.url <- "https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.CFLR_HPRP_ProjectBoundary.zip"
download.file(cflrp.url, tmp)
tmp2 <- tempfile()
unzip(zipfile=tmp, exdir = tmp2 )
cflrp.bdry <- read_sf(tmp2)

Load the data

From our class folder

### Wildfire Hazard Data
wildfire.haz <- rast("opt/data/2023/assignment01/wildfire_hazard_agg.tif")

## CEJST data
cejst <- read_sf("opt/data/2023/assignment01/cejst_pnw.shp")

Check Validity

The USFS datasets are new; let’s check the geometries

all(st_is_valid(fs.bdry))

[1] FALSE

all(st_is_valid(cflrp.bdry))

[1] FALSE

Make them valid

fs.bdry.valid <- st_make_valid(fs.bdry)
all(st_is_valid(fs.bdry.valid))

[1] TRUE

cflrp.bdry.valid <- st_make_valid(cflrp.bdry)
all(st_is_valid(cflrp.bdry.valid))

[1] TRUE

Set Projection

We know these are in different CRS
Project to the CRS of the raster
Using %>% to pipe data through the function

cflrp.proj <- cflrp.bdry.valid %>% st_transform(., crs=crs(wildfire.haz))
cejst.proj <- cejst %>% st_transform(., crs=crs(wildfire.haz))
fs.proj <- fs.bdry.valid %>% st_transform(., crs=crs(wildfire.haz))

Subset Geometries

We can use the [] notation to subset the one dataset based on the geometry of the other
Need USFS and CFLRP within the region
Then need tracts that overlap USFS

fs.subset <- fs.proj[cejst.proj, ]
cflrp.subset <- cflrp.proj[cejst.proj, ]
cejst.subset <- cejst.proj[fs.subset, ]

Subset geometries

sub.map <- tm_shape(cejst.subset) +
  tm_polygons(col="gray") +
  tm_shape(fs.subset) + 
  tm_polygons(col="darkgreen") +
  tm_shape(cflrp.subset) +
  tm_polygons(col="goldenrod")

Select Relevant Columns

Use the codebook to identify the right columns
Then use select from dplyr
geometries are sticky!

cejst.df <- cejst.subset %>% 
  select(GEOID10, LMI_PFS, LHE, HBF_PFS)
head(cejst.df)

Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -1598729 ymin: 2388182 xmax: -1475201 ymax: 3000813
Projected CRS: unnamed
# A tibble: 6 × 5
  GEOID10     LMI_PFS   LHE HBF_PFS                                     geometry
  <chr>         <dbl> <int>   <dbl>                           <MULTIPOLYGON [m]>
1 16025970100    0.75     0    0.6  (((-1485848 2427049, -1485813 2426977, -148…
2 16057005500    0.43     0    0.44 (((-1567845 2843218, -1567803 2843209, -156…
3 16057005600    0.3      0    0.05 (((-1573408 2823058, -1573412 2823071, -157…
4 16009950100    0.55     1    0.4  (((-1566013 2857573, -1565966 2857623, -156…
5 16017950500    0.75     1    0.53 (((-1545064 2973813, -1545058 2973793, -154…
6 16017950400    0.6      1    0.68 (((-1544958 2973552, -1544957 2973564, -154…

Extract wildfire data

Can use zonal for one summary statistic
Or extract for multiple

wildfire.zones <- terra::zonal(wildfire.haz, vect(cejst.df), fun="mean", na.rm=TRUE)

head(wildfire.zones)

     WHP_ID
1 2997.7951
2  182.8864
3  386.9580
4  173.1703
5  193.4199
6  210.4406