Assignment 4 Solutions: Predicates and Measures

1. Load the cejst_nw.shp use the correct predicates to determine whether the geometries are valid and to check for empty geometries. If there are empty geometries, determine which rows have empty geometries (show your code).

Remember that predicates return logical (i.e. TRUE or FALSE) answers so we are looking for functions with st_is_* to look for valid or empty geometries. We wrap those in the all() or any() function calls so that we get a single TRUE or FALSE for the entire geometry collection rather than returning the value for each individual observation. While those can be useful for figuring out if the entire dataset meets our criteria (i.e., all are valid or any have empty geometries), identifying which records have empty geometries takes an additional step. We use which() to return the row index of each record that returns a TRUE for st_is_empty() and then subset the original data using the [] notation keeping only the rows with empty geometries and all other columns.

library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(terra)
terra 1.7.39

Attaching package: 'terra'

The following object is masked from 'package:tidyr':

    extract
cejst.nw <- read_sf("data/opt/data/2023/assignment04/cejst_nw.shp")
all(st_is_valid(cejst.nw))
[1] TRUE
any(st_is_empty(cejst.nw))
[1] TRUE
which(st_is_empty(cejst.nw))
 [1]  787  891  894  895  911  916  971 1213 1252 1708 1709 1898 2097 2106 2107
[16] 2321 2322 2324 2423
cejst.nw[which(st_is_empty(cejst.nw)),]
Simple feature collection with 19 features and 123 fields (with 19 geometries empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
# A tibble: 19 × 124
   GEOID10   SF    CF    DF_PFS AF_PFS HDF_PFS DSF_PFS EBF_PFS EALR_PFS EBLR_PFS
   <chr>     <chr> <chr>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>
 1 41057990… Oreg… Till…     NA     NA      NA      NA      NA       NA       NA
 2 41007990… Oreg… Clat…     NA     NA      NA      NA      NA       NA       NA
 3 41011990… Oreg… Coos…     NA     NA      NA      NA      NA       NA       NA
 4 41015990… Oreg… Curr…     NA     NA      NA      NA      NA       NA       NA
 5 41039990… Oreg… Lane…     NA     NA      NA      NA      NA       NA       NA
 6 41041990… Oreg… Linc…     NA     NA      NA      NA      NA       NA       NA
 7 41019990… Oreg… Doug…     NA     NA      NA      NA      NA       NA       NA
 8 53055990… Wash… San …     NA     NA      NA      NA      NA       NA       NA
 9 53067990… Wash… Thur…     NA     NA      NA      NA      NA       NA       NA
10 53061990… Wash… Snoh…     NA     NA      NA      NA      NA       NA       NA
11 53061990… Wash… Snoh…     NA     NA      NA      NA      NA       NA       NA
12 53057990… Wash… Skag…     NA     NA      NA      NA      NA       NA       NA
13 53033990… Wash… King…     NA     NA      NA      NA      NA       NA       NA
14 53035990… Wash… Kits…     NA     NA      NA      NA      NA       NA       NA
15 53049990… Wash… Paci…     NA     NA      NA      NA      NA       NA       NA
16 53027990… Wash… Gray…     NA     NA      NA      NA      NA       NA       NA
17 53029992… Wash… Isla…     NA     NA      NA      NA      NA       NA       NA
18 53031990… Wash… Jeff…     NA     NA      NA      NA      NA       NA       NA
19 53009990… Wash… Clal…     NA     NA      NA      NA      NA       NA       NA
# ℹ 114 more variables: EPLR_PFS <dbl>, HBF_PFS <dbl>, LLEF_PFS <dbl>,
#   LIF_PFS <dbl>, LMI_PFS <dbl>, PM25F_PFS <dbl>, HSEF <dbl>, P100_PFS <dbl>,
#   P200_I_PFS <dbl>, AJDLI_ET <int>, LPF_PFS <dbl>, KP_PFS <dbl>,
#   NPL_PFS <dbl>, RMP_PFS <dbl>, TSDF_PFS <dbl>, TPF <dbl>, TF_PFS <dbl>,
#   UF_PFS <dbl>, WF_PFS <dbl>, UST_PFS <dbl>, N_WTR <int>, N_WKFC <int>,
#   N_CLT <int>, N_ENY <int>, N_TRN <int>, N_HSG <int>, N_PLN <int>,
#   N_HLTH <int>, SN_C <int>, SN_T <chr>, DLI <int>, ALI <int>, PLHSE <int>, …

2. Load the landmarks_ID.csv table and convert it to an sf object. Now filter to just the hospital records (MTFCC == "K1231") and calculate the distance between all of the hospitals in Idaho. Note that you’ll have to figure out the CRS for the landmarks dataset…

Here we are interested in distance which is a measure (not a predicate or transformer), but to get there we need to take a few extra steps. First, we read in the csv file and convert it to coordinates (using st_as_sf, a transformer). Then we use dplyr::filter to retain only the hospitals in the dataset. Finally, because this is a lat/long dataset, we assume a geodetic projection of WGS84 and assign it to the filtered object. Once we’ve gotten all that squared away, it’s just a matter of using the st_distance function to return the distance matrix for all objects in the dataset.

hospitals.id <- read_csv("data/opt/data/2023/assignment04/landmarks_ID.csv") %>% 
  st_as_sf(., coords = c("longitude", "lattitude")) %>% 
  filter(., MTFCC == "K1231")
Rows: 12169 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): ANSICODE, FULLNAME, MTFCC
dbl (4): STATEFP, POINTID, longitude, lattitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
st_crs(hospitals.id) <- 4269

dist.hospital <- st_distance(hospitals.id)

dist.hospital[1:5, 1:5]
Units: [m]
          [,1]      [,2]     [,3]     [,4]     [,5]
[1,]      0.00  61303.25 186961.5 480013.6 538397.3
[2,]  61303.25      0.00 134425.2 492433.2 547734.4
[3,] 186961.47 134425.16      0.0 459255.9 505156.4
[4,] 480013.57 492433.21 459255.9      0.0  62690.4
[5,] 538397.34 547734.35 505156.4  62690.4      0.0

3. Filter the cejst_nw.shp to just those records from Ada County. Then filter again to return the row with the highest annual loss rate for agriculture (2 hints: you’ll need to look at the columns.csv file in the data folder to figure out which column is the expected agricultural loss rate and you’ll need to set na.rm=TRUEwhen looking for the maximum value). Calculate the area of the resulting polygon.

This one should be relatively straightforward. We start with another call to dplyr::filter to get down to just the tracts in Ada County (note the use of the & to combine two logical calls). Then we use a second filter to return the row with the max value for agricultural loss. Note that we have to use the na.rm=TRUE argument to avoid having the NA values force the function to return NA.

ada.cejst <- cejst.nw %>% 
  filter(., SF == "Idaho" & CF == "Ada County") 

ada.max.EALR <- ada.cejst %>%  
  filter(., EALR_PFS == max(EALR_PFS, na.rm = TRUE))
  
ada.max.EALR[, c("SF", "CF", "EALR_PFS")]
Simple feature collection with 1 feature and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -116.1998 ymin: 43.11297 xmax: -115.9742 ymax: 43.59134
Geodetic CRS:  WGS 84
# A tibble: 1 × 4
  SF    CF         EALR_PFS                                             geometry
  <chr> <chr>         <dbl>                                   <MULTIPOLYGON [°]>
1 Idaho Ada County      0.7 (((-116.1371 43.55217, -116.137 43.55222, -116.1369…

4. Finally, look at the helpfile for the terra::adjacent command. How do you specify which cells you’d like to get the adjacency matrix for? How do you return only the cells touching your cells of interest? Use the example in the helpfile to illustrate how you’d do this on a toy dataset - this will help you learn to ask minimally reproducible examples.

We can access the helpfile for adjacent by using ?terra::adjacent (I won’t do that here because I don’t want to print the entire helpfile). From that we can see that the cells argument is the place to specify which cells we are interested in. also see that the directions argument allows us to specify whether we want “rook”, “bishop”, or “queen” neighbors. Finally, we see that if we want to exclude the focal cell itself, we have to set include to FALSE. By plotting the map with the cell numbers, we can see that cells 1 and 5 are on th top row of the raster and thus do not have any neighbors for for the upper 3 categories whereas cell 55 hase all 8 neighbors. If you choose cells that are in the center of the raster, you get all neighbors

r <- rast(nrows=10, ncols=10)
cellnum <- cells(r)
r[] <- cellnum
plot(r)

adjacent(r, cells=c(1, 5, 55), directions="queen", include=FALSE)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
0   NaN  NaN  NaN   10    2   20   11   12
4   NaN  NaN  NaN    4    6   14   15   16
54   44   45   46   54   56   64   65   66
adjacent(r, cells=c(51, 52, 55), directions="queen", include=FALSE)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
50   50   41   42   60   52   70   61   62
51   41   42   43   51   53   61   62   63
54   44   45   46   54   56   64   65   66