Spatial analysis in R

STAT150 | R for Data Science

Ihor Miroshnychenko

Kyiv School of Economics

Maps

John Snow and 1854 cholera epidemic

  • 10% of the population of Soho died in a week (!!)
  • Miasma theory said it was because the air was bad

The Broad Street pump

Outright lies

Fake maps and junk maps

Points can be useless

Choropleths can be great

Choropleths can distort

Land doesn’t vote

Cartograms

World projections

UA projections

Which projection is best?

  • None of them
  • There are no good or bad projections
  • There are appropriate and inappropriate projections
  • It depends on the purpose of the map

Putting data on maps

Maps with lines

Maps with points

Voronoi maps

Small multiples that look like maps

GIS in R with sf

Shapefiles

  • Geographic information is shared as shapefiles
  • These are not like regular single CSV files!
  • Shapefiles come as zipped files with a bunch of different files inside
data/ne_110m_admin_0_countries
├── ne_110m_admin_0_countries.cpg
├── ne_110m_admin_0_countries.dbf
├── ne_110m_admin_0_countries.prj
├── ne_110m_admin_0_countries.README.html
├── ne_110m_admin_0_countries.shp
├── ne_110m_admin_0_countries.shx
└── ne_110m_admin_0_countries.VERSION.txt

Structure of a shapefile

library(sf)

world_shapes <- read_sf("data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")

world_shapes |> 
  select(TYPE, GEOUNIT, ISO_A3, geometry) |> 
  head(7)
Simple feature collection with 7 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
Geodetic CRS:  WGS 84
# A tibble: 7 × 4
  TYPE              GEOUNIT                  ISO_A3                     geometry
  <chr>             <chr>                    <chr>            <MULTIPOLYGON [°]>
1 Sovereign country Fiji                     FJI    (((180 -16.06713, 180 -16.5…
2 Sovereign country Tanzania                 TZA    (((33.90371 -0.95, 34.07262…
3 Indeterminate     Western Sahara           ESH    (((-8.66559 27.65643, -8.66…
4 Sovereign country Canada                   CAN    (((-122.84 49, -122.9742 49…
5 Country           United States of America USA    (((-122.84 49, -120 49, -11…
6 Sovereign country Kazakhstan               KAZ    (((87.35997 49.21498, 86.59…
7 Sovereign country Uzbekistan               UZB    (((55.96819 41.30864, 55.92…

Where to find shapefiles

Scales

1:10m = 1:10,000,000
1 cm = 100 km

1:50m = 1:50,000,000
1 cm = 500 km

1:110m = 1:110,000,000
1 cm = 1,100 km

Using too high of a resolution makes your maps slow and huge

Latitude and longitude

The magic geometry column

As long as you have a magic geometry column, all you need to do to plot maps is geom_sf()

The magic geometry column

  • Use coord_sf() to change projections
ggplot() +
  geom_sf(data = world_shapes) +
  coord_sf(crs = "+proj=robin")

The magic geometry column

  • Use coord_sf() to change projections
ggplot() +
  geom_sf(data = world_shapes) +
  coord_sf(crs = "+proj=laea +lat_0=52 +lon_0=10")

Use aesthetics like normal

All regular ggplot layers and aesthetics work

ggplot() +
  geom_sf(data = world_shapes, 
          aes(fill = POP_EST),
          color = "white", size = 0.15) +
  coord_sf(crs = "+proj=robin") +
  scale_fill_gradient(labels = scales::comma) +
  labs(fill = NULL) +
  theme_void() +
  theme(legend.position = "bottom")

No geometry column?

Make your own with st_as_sf()

other_data <- tibble(
  country = c("Kyiv", "Lviv", "Odesa"),
  lon = c(30.5234, 24.0316, 30.7326),
  lat = c(50.4501, 49.8397, 46.4825)
)
other_data
# A tibble: 3 × 3
  country   lon   lat
  <chr>   <dbl> <dbl>
1 Kyiv     30.5  50.5
2 Lviv     24.0  49.8
3 Odesa    30.7  46.5
other_data |> 
  st_as_sf(coords = c("lon", "lat"), crs = 4326)
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 24.0316 ymin: 46.4825 xmax: 30.7326 ymax: 50.4501
Geodetic CRS:  WGS 84
# A tibble: 3 × 2
  country          geometry
* <chr>         <POINT [°]>
1 Kyiv    (30.5234 50.4501)
2 Lviv    (24.0316 49.8397)
3 Odesa   (30.7326 46.4825)

sf is for all GIS stuff

  • Draw maps
  • Calculate distances between points
  • Count observations in a given area
  • Anything else related to geography!

geom_sf() is today’s standard

  • You’ll sometimes find older tutorials and StackOverflow/LLM’s answers about using geom_map() or ggmap() or other things
  • Those still work, but they don’t use the same magical sf system with easy-to-convert projections and other GIS stuff


Stick with sf and geom_sf() and your life will be easy

Crimea comming home

The Natural Earth Project

  • The Natural Earth Project is a public domain map dataset available at three scales: 1:10m, 1:50m, and 1:110m.
  • rnaturalearth is an R package that makes it easy to download and use these maps in R.
library(tidyverse)
library(sf)
library(rnaturalearth)

# Set some colors
ukr_blue <- "#0057b7"  # Blue from the Ukrainian flag
ukr_yellow <- "#ffdd00"  # Yellow from the Ukrainian flag
rus_red <- "#d62718"  # Red from the Russian flag

clr_ocean <- "#d9f0ff"
clr_land <- "#facba6"

# CARTOColors Prism (https://carto.com/carto-colors/)
carto_prism = c(
  "#5F4690", "#1D6996", "#38A6A5", "#0F8554", "#73AF48", "#EDAD08", 
  "#E17C05", "#CC503E", "#94346E", "#6F4070", "#994E95", "#666666"
)

ne_countries(scale = 110) |> 
  filter(admin != "Antarctica") |> 
  ggplot() + 
  geom_sf(aes(fill = continent), color = "white", linewidth = 0.1) +
  scale_fill_manual(values = carto_prism, guide = "none") +
  coord_sf(crs = "+proj=robin") +
  theme_void()

Natural Earth’s de facto policy 😡

Natural Earth draws boundaries of sovereign states according to de facto (“in fact”) status rather than de jure (“by law”).

Natural Earth de jure points of view

Example UA POV

  • UA-POV 10m shapefile shows Crimea in Ukraine.
  • Download manually; rnaturalearth::ne_countries() doesn’t support POV.
world_10_us <- read_sf("data/ne_10m_admin_0_countries_ukr/ne_10m_admin_0_countries_ukr.shp")

world_10_us |> 
  filter(ADMIN == "Ukraine") |> 
  ggplot() +
  geom_sf(fill = ukr_blue) +
  theme_void()

Example UA POV

Low and medium resolution POV 😡

  • Low and medium resolution POV shapefiles are not available yet.
world <- ne_countries(scale = 110, type = "map_units")

ukraine <- world |> filter(admin == "Ukraine")
russia <- world |> filter(admin == "Russia")

ukraine_bbox <- ukraine |> 
  st_buffer(dist = 100000) |>  # Add 100,000 meter buffer around the country 
  st_bbox()

ggplot() +
  geom_sf(data = world, fill = clr_land) +
  geom_sf(data = russia, fill = rus_red) + 
  geom_sf(data = ukraine, fill = ukr_blue, color = ukr_yellow, linewidth = 2) + 
  coord_sf(
    xlim = c(ukraine_bbox["xmin"], ukraine_bbox["xmax"]), 
    ylim = c(ukraine_bbox["ymin"], ukraine_bbox["ymax"])
  ) +
  theme_void() +
  theme(panel.background = element_rect(fill = clr_ocean))

Low and medium resolution POV 😡

Return Crimea to Ukraine w/ R and sf

  • The actual geometric shapes for all the countries in world are MULTIPOLYGONs, or collections of POLYGON geometric objects:
russia |> st_geometry()
Geometry set for 1 feature 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: 41.15142 xmax: 180 ymax: 81.2504
Geodetic CRS:  WGS 84

Return Crimea to Ukraine w/ R and sf

  • We can split MULTIPOLYGONs into their component POLYGONs with st_cast(). Russia consists of 14 different shapes:
(russia_polygons <- russia |> st_geometry() |> st_cast("POLYGON"))
Geometry set for 14 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: 41.15142 xmax: 180 ymax: 81.2504
Geodetic CRS:  WGS 84
First 5 geometries:
plot(russia_polygons[2])
plot(russia_polygons[14])

Identifying the Crimea

  • Plotting each POLYGON is brittle (counts/order change by map scale). Instead, place a POINT in Crimea (e.g., 45°N, 34°E) and select the Russian POLYGON that contains it:
crimea_point <- st_sfc(st_point(c(34, 45)), crs = st_crs(world))

ggplot() +
  geom_sf(data = world, fill = clr_land) +
  geom_sf(data = russia, fill = rus_red) + 
  geom_sf(data = ukraine, fill = ukr_blue, color = ukr_yellow, linewidth = 2) + 
  geom_sf(data = crimea_point) +
  coord_sf(
    xlim = c(ukraine_bbox["xmin"], ukraine_bbox["xmax"]), 
    ylim = c(ukraine_bbox["ymin"], ukraine_bbox["ymax"])
  ) +
  theme_void() +
  theme(panel.background = element_rect(fill = clr_ocean))

Identifying the Crimea

Extracting Crimea polygon

  • st_intersects() tells us which POLYGON contains the POINT:
# Extract the Russia MULTIPOLYGON and convert it to polygons
russia_polygons <- world |> 
  filter(admin == "Russia") |> 
  st_geometry() |> 
  st_cast("POLYGON")

# Extract the Russia polygon that has Crimea in it
crimea_polygon <- russia_polygons |>
  keep(\(x) st_intersects(x, crimea_point, sparse = FALSE))

# This is the same as russia_polygons[14]
plot(crimea_polygon)

Extracting Crimea polygon from Russia

  • Now we can remove Crimea from Russia
# Remove Crimea from Russia
new_russia <- russia_polygons |>
  discard(\(x) any(st_equals(x, crimea_polygon, sparse = FALSE))) |> 
  st_combine() |> 
  st_cast("MULTIPOLYGON")

ggplot() +
  geom_sf(data = world, fill = clr_land) +
  geom_sf(data = new_russia, fill = rus_red) + 
  geom_sf(data = ukraine, fill = ukr_blue, color = ukr_yellow, linewidth = 2) + 
  coord_sf(
    xlim = c(ukraine_bbox["xmin"], ukraine_bbox["xmax"]), 
    ylim = c(ukraine_bbox["ymin"], ukraine_bbox["ymax"])
  ) +
  theme_void() +
  theme(panel.background = element_rect(fill = clr_ocean))

Extracting Crimea polygon from Russia

Adding Crimea to Ukraine

# Extract the Ukraine MULTIPOLYGON and convert it to polygons
ukraine_polygons <- world |> 
  filter(admin == "Ukraine") |> 
  st_geometry() |> 
  st_cast("POLYGON")

# Add Crimea to Ukraine
true_ukraine <- st_union(c(ukraine_polygons, crimea_polygon)) |>
  st_cast("MULTIPOLYGON")

ggplot() +
  geom_sf(data = world, fill = clr_land) +
  geom_sf(data = new_russia, fill = rus_red) + 
  geom_sf(data = true_ukraine, fill = ukr_blue, color = ukr_yellow, linewidth = 2) + 
  coord_sf(
    xlim = c(ukraine_bbox["xmin"], ukraine_bbox["xmax"]), 
    ylim = c(ukraine_bbox["ymin"], ukraine_bbox["ymax"])
  ) +
  theme_void() +
  theme(panel.background = element_rect(fill = clr_ocean))

Adding Crimea to Ukraine

Update full data

world_un <- world |>
  mutate(geometry = case_when(
    admin == "Ukraine" ~ true_ukraine,
    admin == "Russia" ~ new_russia,
    .default = geometry
  ))

Welcome back, Crimea!

eastern_eu_bbox <- ukraine |> 
  st_buffer(dist = 700000) |>  # Add 700,000 meter buffer around the country 
  st_bbox()

ggplot() +
  geom_sf(data = world_un, aes(fill = factor(mapcolor9)), linewidth = 0.25, color = "white") +
  scale_fill_manual(values = carto_prism, guide = "none") +
  coord_sf(
    xlim = c(eastern_eu_bbox["xmin"], eastern_eu_bbox["xmax"]), 
    ylim = c(eastern_eu_bbox["ymin"], eastern_eu_bbox["ymax"])
  ) +
  theme_void() +
  theme(panel.background = element_rect(fill = clr_ocean))

Welcome back, Crimea!

Alt data sources

Questions?



Course materials

imiroshnychenko@kse.org.ua

@araprof

@datamirosh

@ihormiroshnychenko

@aranaur

aranaur.rbind.io