Purpose

Apply Stanford’s Open Policing methodology on Mpls Police Data.

https://openpolicing.stanford.edu/

They have an R script for Saint Paul but not Mpls.

Description

Data Sources

Minneapolis Police Stop Data: https://opendata.minneapolismn.gov/datasets/police-stop-data

Other Notes

Change log

## Load libraries

# read
library(tidycensus); census_api_key("0c5d9fddc67544f23d3fa524660c9d81d4be2c90") # add API key here

# wrangle
library(tidyverse)
library(lubridate) # work with dates and times

# viz
library(janitor) # clean names and make nice tables
library(gghighlight)
library(ggthemes)
library(gt) # nice tables

# model
library(sf) # GIS

# export

## Custom functions

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

Read

Get police stop data.

Use an API link (ArcGIS) to directly download the latest geojson using st_read

Looking at the first rows of the data:

stop <- st_read("https://opendata.arcgis.com/datasets/215b4b543d894750aef86c725b56ee2a_0.geojson")
## Reading layer `Police_Stop_Data' from data source `https://opendata.arcgis.com/datasets/215b4b543d894750aef86c725b56ee2a_0.geojson' using driver `GeoJSON'
## Simple feature collection with 171935 features and 19 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -93.32915 ymin: 0 xmax: 0 ymax: 45.05124
## geographic CRS: WGS 84
head(stop)
## Simple feature collection with 6 features and 19 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -93.28807 ymin: 44.90617 xmax: -93.24741 ymax: 45.02484
## geographic CRS: WGS 84
##   OBJECTID masterIncidentNumber        responseDate reason
## 1        1            16-395258 2016-10-31 17:40:47       
## 2        2            16-395296 2016-10-31 18:06:36       
## 3        3            16-395326 2016-10-31 18:20:54       
## 4        4            16-395328 2016-10-31 18:23:20       
## 5        5            16-395333 2016-10-31 18:26:05       
## 6        6            16-395364 2016-10-31 18:45:31   <NA>
##                       problem     callDisposition citationIssued personSearch
## 1       Suspicious Person (P)         BKG-Booking                         YES
## 2 Traffic Law Enforcement (P)          TAG-Tagged                          NO
## 3         Attempt Pick-Up (P)         RFD-Refused                          NO
## 4       Suspicious Person (P)         BKG-Booking                         YES
## 5      Suspicious Vehicle (P) GOA-Gone on Arrival                          NO
## 6 Traffic Law Enforcement (P)         ADV-Advised           <NA>         <NA>
##   vehicleSearch preRace    race  gender      lat      long         x       y
## 1            NO   Black   Black    Male 44.97957 -93.27257 -10383055 5618306
## 2            NO Unknown   Black    Male 44.96269 -93.27592 -10383428 5615650
## 3            NO Unknown Unknown Unknown 45.02484 -93.28807 -10384780 5625432
## 4            NO   Black   Black    Male 44.94656 -93.24741 -10380254 5613112
## 5            NO   Other Unknown    Male 44.90617 -93.25501 -10381100 5606762
## 6          <NA>    <NA>    <NA>    <NA> 44.93954 -93.27791 -10383650 5612008
##   policePrecinct                     neighborhood      lastUpdateDate
## 1              1                    Downtown West 2017-08-08 05:25:31
## 2              5 Steven's Square - Loring Heights 2017-08-08 05:26:13
## 3              4                  Webber - Camden 2017-08-08 05:24:35
## 4              3                         Corcoran 2017-08-08 05:25:31
## 5              3                             Hale 2017-08-08 05:25:03
## 6              5                          Lyndale 2017-08-08 05:25:31
##                     geometry
## 1 POINT (-93.27257 44.97957)
## 2 POINT (-93.27592 44.96269)
## 3 POINT (-93.28807 45.02484)
## 4 POINT (-93.24741 44.94656)
## 5 POINT (-93.25501 44.90617)
## 6 POINT (-93.27791 44.93954)

Overview of the Data

How many total stops? Distinct count of master incident numbers.

n_distinct(stop$masterIncidentNumber)
## [1] 171935

What’s the date range?

min(stop$responseDate)
## [1] "2016-10-31 17:40:47 CDT"
max(stop$responseDate)
## [1] "2021-01-12 21:26:47 CST"

Let’s focus on 2016-2020. Convert from sf to a tibble - faster to work with, and can join geometry back later on.

stop2 <- stop %>%
  filter(year(responseDate) >= 2017,
         year(responseDate) < 2021) %>%
  as_tibble() %>%
  select(-geometry)

For now, look at vehicular stops. Try doing pedestrian stops later.

stop2 %>% 
  tabyl(reason) %>% 
  adorn_pct_formatting() %>%
  gt()
reason n percent valid_percent
10432 6.3% 7.2%
Citizen / 9-1-1 42854 26.0% 29.6%
Equipment Violation 20871 12.7% 14.4%
Investigative 28823 17.5% 19.9%
Moving Violation 41674 25.3% 28.8%
NA 19887 12.1% -
stop2 %>% 
  tabyl(problem) %>% 
  adorn_pct_formatting() %>%
  gt()
problem n percent
Attempt Pick-Up (P) 3643 2.2%
Curfew Violations (P) 112 0.1%
Suspicious Person (P) 45784 27.8%
Suspicious Vehicle (P) 36458 22.2%
Traffic Law Enforcement (P) 78492 47.7%
Truancy (P) 52 0.0%

What’s the interaction between reasons and problems?

stop2 %>% 
  tabyl(reason, problem) %>% 
  gt()
reason Attempt Pick-Up (P) Curfew Violations (P) Suspicious Person (P) Suspicious Vehicle (P) Traffic Law Enforcement (P) Truancy (P)
220 6 1830 2004 6364 8
Citizen / 9-1-1 1196 16 22804 18590 237 11
Equipment Violation 2 1 332 866 19670 0
Investigative 1130 69 12723 8877 6001 23
Moving Violation 0 1 878 1912 38883 0
NA 1095 19 7217 4209 7337 10

Missing data

Do we know why many stop reasons are NA? Race and Vehicle/Person search for these are NA as well, so drop from the data.

stop2 %>%
  filter(is.na(reason)) %>%
  tabyl(race)
##  race     n percent valid_percent
##  <NA> 19887       1            NA

Filtering data

Does it make sense to include both Suspicious Vehicles and Traffic Law Enforcement as problems? Also filter out missing reasons.

How many stops now?

stop_v <- stop2 %>%
  filter(problem %in% c("Suspicious Vehicle (P)", "Traffic Law Enforcement (P)"),
         !is.na(reason))

n_distinct(stop_v$masterIncidentNumber) # only one incident per row, so we can do a row count
## [1] 103404

How many by year?

stop_v %>%
  mutate(year = year(responseDate)) %>%
  count(year) %>%
  gt()
year n
2017 32715
2018 29140
2019 24008
2020 17541

How many stops by race?

stop_v %>%
  tabyl(race) %>%
  adorn_pct_formatting() %>%
  gt()
race n percent
Asian 1705 1.6%
Black 37962 36.7%
East African 6388 6.2%
Latino 4712 4.6%
Native American 1647 1.6%
Other 2673 2.6%
Unknown 21147 20.5%
White 27170 26.3%

Stops by race over the years:

stop_v %>%
  mutate(year = year(responseDate)) %>%
  count(year, race) %>%
  ggplot(aes(year, n, color = race)) +
    geom_line(size = 1) +
    geom_point() +
    gghighlight(race %in% c("White", "Black", "East African", "Unknown",
                            "Latino", NA)) +
    theme_fivethirtyeight() +
    labs(title = "Police stops over time by race")

A few observations:

  • Stops overall seem to have gone down since 2018
  • There are more unknown race in more recent data (2020)
  • There was more missing data (represented by the black line) in 2017-2018 (about 5000 a year) but this has reduced in recent years (better data collection methods?)
  • Black people are stopped the most, followed by white people
  • In other jurisdictions, East African would (likely) be included with the Black race

Benchmark Test

Get a population as a baseline to compare stop rates. Are black/East African drivers stopped proportionally more than other drivers?

Get the population estimates from the census using tidycensus

# list of variables to get and their names
var <- c(
  "Total" = "DP05_0033",
  "White" = "DP05_0077", # using white non-hispanic
  "Black" = "DP05_0038",
  "Native American" = "DP05_0039",
  "Asian" = "DP05_0044",
  "Latino" = "DP05_0071", # hispanic/latinx of any race
  "Other" = "DP05_0057" # some other race
)

# query census API with list of variables

mpls_race <- get_acs(
  geography = "place", state = "MN",
  year = 2019,
  survey = "acs5", # get 5 year estimates - smaller margin of error and more appropriate for multi-year data
  variables = var
) %>%
  filter(str_detect(NAME, "Minneapolis")) %>% # get only Minneapolis from other MN places
  select(-c(moe, GEOID, NAME))

# get percentages (out of total)

mpls_race %>%
  mutate(total = estimate[variable == "Total"]) %>%
  mutate(prop = estimate / total) %>%
  select(-c(total)) %>%
  gt() %>%
    fmt_percent(
      columns = vars(prop),
      decimals = 1
    )
variable estimate prop
Total 420324 100.0%
Black 80664 19.2%
Native American 5956 1.4%
Asian 24961 5.9%
Other 20915 5.0%
Latino 40432 9.6%
White 252180 60.0%

To match census data, combine East African with black drivers.

stop_v2 <- stop_v %>%
  mutate(race2 = if_else(race == "East African", "Black", race))

stop_v2 %>%
  tabyl(race2) %>%
  adorn_pct_formatting() %>%
  gt()
race2 n percent
Asian 1705 1.6%
Black 44350 42.9%
Latino 4712 4.6%
Native American 1647 1.6%
Other 2673 2.6%
Unknown 21147 20.5%
White 27170 26.3%

Stop Rates

What is the rate of police stops considering the population of races in Mpls?

Not sure what to do about Unknown/NA - quite a few in this category

stop_v2 %>%
  count(race2) %>%
  left_join(mpls_race, by = c("race2" = "variable")) %>%
  mutate(stop_rate = n / estimate) %>%
  gt() %>%
    fmt_number(
      columns = vars(stop_rate),
      decimals = 2
    )
race2 n estimate stop_rate
Asian 1705 24961 0.07
Black 44350 80664 0.55
Latino 4712 40432 0.12
Native American 1647 5956 0.28
Other 2673 20915 0.13
Unknown 21147 NA NA
White 27170 252180 0.11

Black drivers are stopped more than 5 times compared to white drivers. Latinx slightly more. Native American drivers are stopped 1.5 times more than white drivers.

Search Rates

What proportion of drivers of each race are searched (vehicle or person)?

stop_v2 %>%
  group_by(race2) %>%
  summarize(
    search_rate = mean(vehicleSearch == "YES", na.rm = TRUE),
    frisk_rate = mean(personSearch == "YES", na.rm = TRUE)
  ) %>%
  gt() %>%
    fmt_number(
      columns = vars(search_rate, frisk_rate),
      decimals = 2
    )
race2 search_rate frisk_rate
Asian 0.06 0.06
Black 0.16 0.17
Latino 0.09 0.09
Native American 0.25 0.31
Other 0.09 0.09
Unknown 0.04 0.02
White 0.05 0.06

Black drivers are 3 times more likely to have their vehicle searched or be frisked than white drivers. Native American drivers are 4.7 times more likely for vehicle searches and 5.8 times for being frisked.

Caveats

Here the caveats from the Stanford Open Policing Project are important. These rates are not in themselves evidence of bias or discrimination because we don’t know the underlying rates of breaking traffic laws or committing crimes. Also it may reflect differences in transportation across races.

Outcome Test

Hit Rate (Citations and Booking)

Here we could look at hit rate or the contraband recovery rate - how often searches were justified by finding contraband.

We don’t have that in this data set. We do have whether a citation was issued (as its own variable) and booking as part of the call disposition variable.

Should we look at vehicle and person search (frisking) separately?

stop_v2 %>% 
  tabyl(personSearch, vehicleSearch) %>%
  gt()
personSearch NO YES
NO 90679 2157
YES 1924 8644

Actually, vehicle combined with person searches are more common than either search alone. To simplify, we can create a variable specifying whether any type of search is conducted.

stop_v3 <- stop_v2 %>%
  mutate(any_search = if_else(personSearch == "YES" | vehicleSearch == "YES",
                              1, 0))
  
stop_v3 %>%
  filter(any_search == 1) %>%
  group_by(race2) %>%
  summarize(
    citation_rate = mean(citationIssued == "YES", na.rm = TRUE),
    book_rate = mean(callDisposition == "BKG-Booking", na.rm = TRUE)
  ) %>%
  gt() %>%
    fmt_number(
      columns = vars(citation_rate, book_rate),
      decimals = 2
    )
race2 citation_rate book_rate
Asian 0.13 0.49
Black 0.18 0.32
Latino 0.27 0.36
Native American 0.10 0.53
Other 0.15 0.38
Unknown 0.12 0.11
White 0.16 0.45

Here lower rates could signal discrimination. If black drivers are stopped but only cited or booked 5% of the time, and white drivers 50% of the time, “would lead us to believe that officers made sure they were certain white individuals had contraband before deciding to search, but that they were searching black individuals on a whiff of evidence” (Stanford Open Policing Project).

Here we see black drivers have a higher citation rate than white drivers - Native American drivers have a somewhat lower rate. However, 45% of searches end up with white drivers booked, but only 32% of black drivers, and 32% of Latinx drivers. Here Native American drivers have a higher book rate.

stop_v3 %>%
  filter(any_search == 1) %>%
  mutate(year = year(responseDate)) %>%
  group_by(year, race2) %>%
  summarize(
    citation_rate = mean(citationIssued == "YES", na.rm = TRUE),
    book_rate = mean(callDisposition == "BKG-Booking", na.rm = TRUE)
  ) %>%
  pivot_longer(citation_rate:book_rate, names_to = "rate", values_to = "value") %>%
  filter(race2 %in% c("White", "Black", "Native American", "Latino")) %>%
  ggplot(aes(year, value, color = race2)) +
    geom_line(size = 1) +
    facet_wrap(~ rate) +
    theme_fivethirtyeight() +
    labs(title = "Rates of booking and citations by race over time",
         color = "") +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1))

By Neighborhood

Does the citation or booking rate differ across police precincts? There are only 5 precincts, may not be useful to see. What about Minneapolis neighborhoods?

rates <- stop_v3 %>%
  filter(any_search == 1) %>%
  group_by(race2, neighborhood) %>%
  summarize(
    citation_rate = mean(citationIssued == "YES", na.rm = TRUE),
    book_rate = mean(callDisposition == "BKG-Booking", na.rm = TRUE)
  )

# What are the total stops by neighborhood?

stops_by_nh <- stop_v3 %>%
  filter(any_search == 1) %>%
  group_by(neighborhood) %>%
  summarize(stops = n())

# make a tidy data frame with white rates, and minority rates per neighborhood

citation_rate <- rates %>%
  ungroup() %>%
  select(-book_rate) %>%
  filter(race2 %in% c("White", "Black", "Latino"),
         !is.na(neighborhood)) %>%
  pivot_wider(names_from = "race2", values_from = "citation_rate",
              values_fill = 0) %>%
  left_join(stops_by_nh, by = "neighborhood") %>%
  pivot_longer(c(Black:Latino), names_to = "minority_race",
               values_to = "minority_citation_rate") %>%
  rename(white_citation_rate = White)

book_rate <- rates %>%
  ungroup() %>%
  select(-citation_rate) %>%
  filter(race2 %in% c("White", "Black", "Latino"),
         !is.na(neighborhood)) %>%
  pivot_wider(names_from = "race2", values_from = "book_rate",
              values_fill = 0) %>%
  left_join(stops_by_nh, by = "neighborhood") %>%
  pivot_longer(c(Black:Latino), names_to = "minority_race",
               values_to = "minority_book_rate") %>%
  rename(white_book_rate = White)

citation_rate %>%
  ggplot(aes(x = white_citation_rate, y = minority_citation_rate)) +
    geom_point(aes(size = stops), pch = 21) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    facet_wrap(~ minority_race) +
    scale_y_continuous(labels = scales::percent_format()) +
    scale_x_continuous(labels = scales::percent_format()) +
    labs(y = "Minority Citation Rate", x = "White Citation Rate",
         title = "Comparing Citation Rates by Neighborhood")

book_rate %>%
  ggplot(aes(x = white_book_rate, y = minority_book_rate)) +
    geom_point(aes(size = stops), pch = 21) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    facet_wrap(~ minority_race) +
    scale_y_continuous(labels = scales::percent_format()) +
    scale_x_continuous(labels = scales::percent_format()) +
    labs(y = "Minority Book Rate", x = "White Book Rate",
         title = "Comparing Book Rates by Neighborhood")

For most neighborhoods, the citation rate is higher or the same for white drivers as it is for black or Latinx drivers.

For book rates though, the majority of neighborhoods, particularly those with the most stops, have lower rates for black/Latinx drivers.

This signals that officers are applying lower threshholds to minority drivers when it comes to offenses that are bookable. I think.

Model

Export