1. Load each dataset
2. Subset to schools
3. Check geometry validity
4. Align CRS
5. Find tracts within 50km
6. Make Maps
Assignment 6 Solutions: Vector Operations
We want to begin to assess the role of distance from schools in determining the education outcomes for Idahoans. We’ll use the landmarks_pnw.csv
and cejst_pnw.shp
datasets as the basis for this assignment. You’ll need to load the csv and convert it to an sf
object. We want to compare the percentage of individuals age 25 or over with less than a high school degree (HSEF
in the cejst dataset) for of counties within 50km of a school (MTFCC == K2543
) to those that are more than 50km.
You’ll need to follow many of the same operations in the video example from class. Your assignment is:
1. Write out the pseudocode for your analysis
We’ll need to do a few things here including load the data, find the tracts within 50km of a school, and then compare the cejst results. Breaking that into pseudocode would look like:
Note that my fift step (find tracts within 50km) is a little vague. There are lots of ways I could do this. It might be more helpful to add some specificity here like:
1. Load each dataset
2. Subset to schools
3. Check geometry validity
4. Align CRS
5. Buffer schools by 50km
6. Select tracts within the buffer and attribute
7. Make Maps
There are other ways to do this too (like calculating the distance), but those are likely to be more computationally intensive so I’ll leave it at this.
2. Translate the pseudocode into code chunks and create the necessary code (You’ll need to use things like st_distance
, st_buffer
, st_sym_difference
)
Loading the data should be pretty straightforward for you by now. We use
read_sf
for the shapefile andread_csv
for thelandmarks.csv
. We then filter the data here so that we aren’t working with the entire landmarks dataset.
library(sf)
library(tidyverse, quietly = TRUE)
library(tmap, quietly = TRUE)
<- read_sf("data/opt/data/2023/assignment06/cejst_pnw.shp")
cejst.pnw <- read_csv("data/opt/data/2023/assignment06/landmarks_pnw.csv") %>%
landmarks.pnw filter(., MTFCC == "K2543")
We know that one of the datasets are still in long/lat form so we’ll need to make it a
sf
object before checking the geometry makes any sense. We’ll also assign the crs here by adding it to thest_as_sf
call. We also need to make sure that there aren’t any empty geometries as that will cause problems for mapping later.
<- landmarks.pnw %>%
landmarks.sf st_as_sf(., coords = c("longitude", "latitude"), crs=4269)
all(st_is_valid(cejst.pnw))
[1] TRUE
all(st_is_valid(landmarks.sf))
[1] TRUE
any(st_is_empty(cejst.pnw))
[1] TRUE
any(st_is_empty(landmarks.sf))
[1] FALSE
Looks like all the geometries are valid, but there are some empty geometries in the cejst dataset. We will drop those by using
filter
combined with the negation operator (!
) andst_is_empty
to return the rows wherest_is_empty
is not equal toTRUE
. Then we can move forward with making sure the two datasets are aligned by usingst_transform
to change the CRS. We can verify the alignment using a simple call to theplot
function.
<- cejst.pnw %>%
cejst.pnw filter(., !st_is_empty(.))
<- landmarks.sf %>%
landmarks.proj st_transform(., crs=st_crs(cejst.pnw))
plot(st_geometry(cejst.pnw))
plot(st_geometry(landmarks.proj), add=TRUE, col="red")
Now it’s time to find the tracts that are within 50km of a school. We can do this a few ways. First, we’ll use the
st_buffer
approach. We can also calculate the distance matrix between the schools and the tracts usingst_distance
. This adds a little more complexity as we have to then find the values that are greater than 50km. You’ll notice thatst_distance
returns a matrix with a row for each school and a column for each tract. This is a little clumsier to deal with, but more precise than a simple buffer.
<- landmarks.proj %>%
school.buf st_buffer(., dist=50000)
<- st_distance(landmarks.proj, cejst.pnw)
school.dist dim(school.dist)
[1] 220 2571
Once we have the buffered “footprint” of the school we can use
st_filter
(which filters using topological relations) combined with thest_covered_by
predicate to find all of thecejst.pnw
tracts that are covered by the buffer. Notice that if we use the typical[]
subset we get over 200 more records. This is because the latter takes all tracts with an intersection (rather than using our covered by criteria.). We can alter this to achieve the same result as thest_filter
by adding theop=
argument. Using the distance to the points themselves can be a more conservative way of calculating this, but takes a little more work. First we have to get a list of all of the tracts that fall within 50km (usingst_is_within_distance
), then identify which of those list elements are empty (i.e., no schools are within 50km), then set that as an index to subset our cejst data.
<- cejst.pnw %>%
school.tracts.stf st_filter(x =., y = school.buf, .predicate = st_covered_by)
<- cejst.pnw[school.buf,]
school.tracts.sbst <- cejst.pnw[school.buf,, op=st_covered_by]
school.tracts.sbst2
nrow(school.tracts.stf)
[1] 2285
nrow(school.tracts.sbst)
[1] 2512
nrow(school.tracts.sbst2)
[1] 2285
identical(school.tracts.stf, school.tracts.sbst2)
[1] TRUE
<- st_is_within_distance(cejst.pnw, landmarks.proj, dist=50000, sparse = TRUE)
within50 <- lengths(within50) > 0
within50.idx <- cejst.pnw[within50.idx,]
school.tracts.sbst3 nrow(school.tracts.sbst3)
[1] 2511
3. Make a map for both the percentage of individuals with less than a high school degree in counties within 50km and beyond 50km (i.e. make 2 maps)
We now have the full cejst dataset and a dataset that is subsetted to the tracts within 50km of a school. Plotting the
HSEF
values for the tracts within 50km of a school is easy enough. Just map a layer that contains all of the tracts and set it’s color to gray. Then layer the subsetted features on top. Plotting the values ofHSEF
for the tracts beyond 50km is a little trickier (because we haven’t created that dataset yet). We can use the index we created in the previous step to do that here. We can also use themutate
function to create an indicator variable using our index and then create a “small multiples” style map that plots the two side by side. We’ll learn more about “prettying” up these maps in the later parts of the course.
tm_shape(cejst.pnw) +
tm_polygons(col="gray") +
tm_shape(school.tracts.sbst3) +
tm_fill(col="HSEF")
<- cejst.pnw[!within50.idx,]
noschool.tracts tm_shape(cejst.pnw) +
tm_polygons(col="gray") +
tm_shape(noschool.tracts) +
tm_fill(col="HSEF")
<- cejst.pnw %>%
school.combined mutate(., indist = if_else(lengths(within50) > 0, "within50km", "notWithin50km"))
tm_shape(cejst.pnw) +
tm_polygons(col="gray") +
tm_shape(school.combined) +
tm_fill(col="HSEF") +
tm_facets(by = c("indist"), nrow = 1)