Read

Get police stop data.

Use an API link (ArcGIS) to directly download the latest geojson using st_read

Looking at the first rows of the data:

stop <- st_read("https://opendata.arcgis.com/datasets/215b4b543d894750aef86c725b56ee2a_0.geojson")

## Reading layer `Police_Stop_Data' from data source `https://opendata.arcgis.com/datasets/215b4b543d894750aef86c725b56ee2a_0.geojson' using driver `GeoJSON'
## Simple feature collection with 171935 features and 19 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -93.32915 ymin: 0 xmax: 0 ymax: 45.05124
## geographic CRS: WGS 84

head(stop)

## Simple feature collection with 6 features and 19 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -93.28807 ymin: 44.90617 xmax: -93.24741 ymax: 45.02484
## geographic CRS: WGS 84
##   OBJECTID masterIncidentNumber        responseDate reason
## 1        1            16-395258 2016-10-31 17:40:47       
## 2        2            16-395296 2016-10-31 18:06:36       
## 3        3            16-395326 2016-10-31 18:20:54       
## 4        4            16-395328 2016-10-31 18:23:20       
## 5        5            16-395333 2016-10-31 18:26:05       
## 6        6            16-395364 2016-10-31 18:45:31   <NA>
##                       problem     callDisposition citationIssued personSearch
## 1       Suspicious Person (P)         BKG-Booking                         YES
## 2 Traffic Law Enforcement (P)          TAG-Tagged                          NO
## 3         Attempt Pick-Up (P)         RFD-Refused                          NO
## 4       Suspicious Person (P)         BKG-Booking                         YES
## 5      Suspicious Vehicle (P) GOA-Gone on Arrival                          NO
## 6 Traffic Law Enforcement (P)         ADV-Advised           <NA>         <NA>
##   vehicleSearch preRace    race  gender      lat      long         x       y
## 1            NO   Black   Black    Male 44.97957 -93.27257 -10383055 5618306
## 2            NO Unknown   Black    Male 44.96269 -93.27592 -10383428 5615650
## 3            NO Unknown Unknown Unknown 45.02484 -93.28807 -10384780 5625432
## 4            NO   Black   Black    Male 44.94656 -93.24741 -10380254 5613112
## 5            NO   Other Unknown    Male 44.90617 -93.25501 -10381100 5606762
## 6          <NA>    <NA>    <NA>    <NA> 44.93954 -93.27791 -10383650 5612008
##   policePrecinct                     neighborhood      lastUpdateDate
## 1              1                    Downtown West 2017-08-08 05:25:31
## 2              5 Steven's Square - Loring Heights 2017-08-08 05:26:13
## 3              4                  Webber - Camden 2017-08-08 05:24:35
## 4              3                         Corcoran 2017-08-08 05:25:31
## 5              3                             Hale 2017-08-08 05:25:03
## 6              5                          Lyndale 2017-08-08 05:25:31
##                     geometry
## 1 POINT (-93.27257 44.97957)
## 2 POINT (-93.27592 44.96269)
## 3 POINT (-93.28807 45.02484)
## 4 POINT (-93.24741 44.94656)
## 5 POINT (-93.25501 44.90617)
## 6 POINT (-93.27791 44.93954)

Overview of the Data

How many total stops? Distinct count of master incident numbers.

n_distinct(stop$masterIncidentNumber)

## [1] 171935

What’s the date range?

min(stop$responseDate)

## [1] "2016-10-31 17:40:47 CDT"

max(stop$responseDate)

## [1] "2021-01-12 21:26:47 CST"

Let’s focus on 2016-2020. Convert from sf to a tibble - faster to work with, and can join geometry back later on.

stop2 <- stop %>%
  filter(year(responseDate) >= 2017,
         year(responseDate) < 2021) %>%
  as_tibble() %>%
  select(-geometry)

For now, look at vehicular stops. Try doing pedestrian stops later.

stop2 %>% 
  tabyl(reason) %>% 
  adorn_pct_formatting() %>%
  gt()

reason	n	percent	valid_percent
	10432	6.3%	7.2%
Citizen / 9-1-1	42854	26.0%	29.6%
Equipment Violation	20871	12.7%	14.4%
Investigative	28823	17.5%	19.9%
Moving Violation	41674	25.3%	28.8%
NA	19887	12.1%	-

stop2 %>% 
  tabyl(problem) %>% 
  adorn_pct_formatting() %>%
  gt()

problem	n	percent
Attempt Pick-Up (P)	3643	2.2%
Curfew Violations (P)	112	0.1%
Suspicious Person (P)	45784	27.8%
Suspicious Vehicle (P)	36458	22.2%
Traffic Law Enforcement (P)	78492	47.7%
Truancy (P)	52	0.0%

What’s the interaction between reasons and problems?

stop2 %>% 
  tabyl(reason, problem) %>% 
  gt()

reason	Attempt Pick-Up (P)	Curfew Violations (P)	Suspicious Person (P)	Suspicious Vehicle (P)	Traffic Law Enforcement (P)	Truancy (P)
	220	6	1830	2004	6364	8
Citizen / 9-1-1	1196	16	22804	18590	237	11
Equipment Violation	2	1	332	866	19670	0
Investigative	1130	69	12723	8877	6001	23
Moving Violation	0	1	878	1912	38883	0
NA	1095	19	7217	4209	7337	10

Missing data

Do we know why many stop reasons are NA? Race and Vehicle/Person search for these are NA as well, so drop from the data.

stop2 %>%
  filter(is.na(reason)) %>%
  tabyl(race)

##  race     n percent valid_percent
##  <NA> 19887       1            NA

Filtering data

Does it make sense to include both Suspicious Vehicles and Traffic Law Enforcement as problems? Also filter out missing reasons.

How many stops now?

stop_v <- stop2 %>%
  filter(problem %in% c("Suspicious Vehicle (P)", "Traffic Law Enforcement (P)"),
         !is.na(reason))

n_distinct(stop_v$masterIncidentNumber) # only one incident per row, so we can do a row count

## [1] 103404

How many by year?

stop_v %>%
  mutate(year = year(responseDate)) %>%
  count(year) %>%
  gt()

year	n
2017	32715
2018	29140
2019	24008
2020	17541

How many stops by race?

stop_v %>%
  tabyl(race) %>%
  adorn_pct_formatting() %>%
  gt()

race	n	percent
Asian	1705	1.6%
Black	37962	36.7%
East African	6388	6.2%
Latino	4712	4.6%
Native American	1647	1.6%
Other	2673	2.6%
Unknown	21147	20.5%
White	27170	26.3%

Stops by race over the years:

stop_v %>%
  mutate(year = year(responseDate)) %>%
  count(year, race) %>%
  ggplot(aes(year, n, color = race)) +
    geom_line(size = 1) +
    geom_point() +
    gghighlight(race %in% c("White", "Black", "East African", "Unknown",
                            "Latino", NA)) +
    theme_fivethirtyeight() +
    labs(title = "Police stops over time by race")

A few observations:

Stops overall seem to have gone down since 2018
There are more unknown race in more recent data (2020)
There was more missing data (represented by the black line) in 2017-2018 (about 5000 a year) but this has reduced in recent years (better data collection methods?)
Black people are stopped the most, followed by white people
In other jurisdictions, East African would (likely) be included with the Black race

Benchmark Test

Get a population as a baseline to compare stop rates. Are black/East African drivers stopped proportionally more than other drivers?

Get the population estimates from the census using tidycensus

# list of variables to get and their names
var <- c(
  "Total" = "DP05_0033",
  "White" = "DP05_0077", # using white non-hispanic
  "Black" = "DP05_0038",
  "Native American" = "DP05_0039",
  "Asian" = "DP05_0044",
  "Latino" = "DP05_0071", # hispanic/latinx of any race
  "Other" = "DP05_0057" # some other race
)

# query census API with list of variables

mpls_race <- get_acs(
  geography = "place", state = "MN",
  year = 2019,
  survey = "acs5", # get 5 year estimates - smaller margin of error and more appropriate for multi-year data
  variables = var
) %>%
  filter(str_detect(NAME, "Minneapolis")) %>% # get only Minneapolis from other MN places
  select(-c(moe, GEOID, NAME))

# get percentages (out of total)

mpls_race %>%
  mutate(total = estimate[variable == "Total"]) %>%
  mutate(prop = estimate / total) %>%
  select(-c(total)) %>%
  gt() %>%
    fmt_percent(
      columns = vars(prop),
      decimals = 1
    )

variable	estimate	prop
Total	420324	100.0%
Black	80664	19.2%
Native American	5956	1.4%
Asian	24961	5.9%
Other	20915	5.0%
Latino	40432	9.6%
White	252180	60.0%

To match census data, combine East African with black drivers.

stop_v2 <- stop_v %>%
  mutate(race2 = if_else(race == "East African", "Black", race))

stop_v2 %>%
  tabyl(race2) %>%
  adorn_pct_formatting() %>%
  gt()

race2	n	percent
Asian	1705	1.6%
Black	44350	42.9%
Latino	4712	4.6%
Native American	1647	1.6%
Other	2673	2.6%
Unknown	21147	20.5%
White	27170	26.3%

Stop Rates

What is the rate of police stops considering the population of races in Mpls?

Not sure what to do about Unknown/NA - quite a few in this category

stop_v2 %>%
  count(race2) %>%
  left_join(mpls_race, by = c("race2" = "variable")) %>%
  mutate(stop_rate = n / estimate) %>%
  gt() %>%
    fmt_number(
      columns = vars(stop_rate),
      decimals = 2
    )

race2	n	estimate	stop_rate
Asian	1705	24961	0.07
Black	44350	80664	0.55
Latino	4712	40432	0.12
Native American	1647	5956	0.28
Other	2673	20915	0.13
Unknown	21147	NA	NA
White	27170	252180	0.11

Black drivers are stopped more than 5 times compared to white drivers. Latinx slightly more. Native American drivers are stopped 1.5 times more than white drivers.

Search Rates

What proportion of drivers of each race are searched (vehicle or person)?

stop_v2 %>%
  group_by(race2) %>%
  summarize(
    search_rate = mean(vehicleSearch == "YES", na.rm = TRUE),
    frisk_rate = mean(personSearch == "YES", na.rm = TRUE)
  ) %>%
  gt() %>%
    fmt_number(
      columns = vars(search_rate, frisk_rate),
      decimals = 2
    )

race2	search_rate	frisk_rate
Asian	0.06	0.06
Black	0.16	0.17
Latino	0.09	0.09
Native American	0.25	0.31
Other	0.09	0.09
Unknown	0.04	0.02
White	0.05	0.06

Black drivers are 3 times more likely to have their vehicle searched or be frisked than white drivers. Native American drivers are 4.7 times more likely for vehicle searches and 5.8 times for being frisked.

Caveats

Here the caveats from the Stanford Open Policing Project are important. These rates are not in themselves evidence of bias or discrimination because we don’t know the underlying rates of breaking traffic laws or committing crimes. Also it may reflect differences in transportation across races.

Outcome Test

Hit Rate (Citations and Booking)

Here we could look at hit rate or the contraband recovery rate - how often searches were justified by finding contraband.

We don’t have that in this data set. We do have whether a citation was issued (as its own variable) and booking as part of the call disposition variable.

Should we look at vehicle and person search (frisking) separately?

stop_v2 %>% 
  tabyl(personSearch, vehicleSearch) %>%
  gt()

personSearch	NO	YES
NO	90679	2157
YES	1924	8644

Actually, vehicle combined with person searches are more common than either search alone. To simplify, we can create a variable specifying whether any type of search is conducted.

stop_v3 <- stop_v2 %>%
  mutate(any_search = if_else(personSearch == "YES" | vehicleSearch == "YES",
                              1, 0))
  
stop_v3 %>%
  filter(any_search == 1) %>%
  group_by(race2) %>%
  summarize(
    citation_rate = mean(citationIssued == "YES", na.rm = TRUE),
    book_rate = mean(callDisposition == "BKG-Booking", na.rm = TRUE)
  ) %>%
  gt() %>%
    fmt_number(
      columns = vars(citation_rate, book_rate),
      decimals = 2
    )

race2	citation_rate	book_rate
Asian	0.13	0.49
Black	0.18	0.32
Latino	0.27	0.36
Native American	0.10	0.53
Other	0.15	0.38
Unknown	0.12	0.11
White	0.16	0.45

Here lower rates could signal discrimination. If black drivers are stopped but only cited or booked 5% of the time, and white drivers 50% of the time, “would lead us to believe that officers made sure they were certain white individuals had contraband before deciding to search, but that they were searching black individuals on a whiff of evidence” (Stanford Open Policing Project).

Here we see black drivers have a higher citation rate than white drivers - Native American drivers have a somewhat lower rate. However, 45% of searches end up with white drivers booked, but only 32% of black drivers, and 32% of Latinx drivers. Here Native American drivers have a higher book rate.

stop_v3 %>%
  filter(any_search == 1) %>%
  mutate(year = year(responseDate)) %>%
  group_by(year, race2) %>%
  summarize(
    citation_rate = mean(citationIssued == "YES", na.rm = TRUE),
    book_rate = mean(callDisposition == "BKG-Booking", na.rm = TRUE)
  ) %>%
  pivot_longer(citation_rate:book_rate, names_to = "rate", values_to = "value") %>%
  filter(race2 %in% c("White", "Black", "Native American", "Latino")) %>%
  ggplot(aes(year, value, color = race2)) +
    geom_line(size = 1) +
    facet_wrap(~ rate) +
    theme_fivethirtyeight() +
    labs(title = "Rates of booking and citations by race over time",
         color = "") +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1))

By Neighborhood

Does the citation or booking rate differ across police precincts? There are only 5 precincts, may not be useful to see. What about Minneapolis neighborhoods?

rates <- stop_v3 %>%
  filter(any_search == 1) %>%
  group_by(race2, neighborhood) %>%
  summarize(
    citation_rate = mean(citationIssued == "YES", na.rm = TRUE),
    book_rate = mean(callDisposition == "BKG-Booking", na.rm = TRUE)
  )

# What are the total stops by neighborhood?

stops_by_nh <- stop_v3 %>%
  filter(any_search == 1) %>%
  group_by(neighborhood) %>%
  summarize(stops = n())

# make a tidy data frame with white rates, and minority rates per neighborhood

citation_rate <- rates %>%
  ungroup() %>%
  select(-book_rate) %>%
  filter(race2 %in% c("White", "Black", "Latino"),
         !is.na(neighborhood)) %>%
  pivot_wider(names_from = "race2", values_from = "citation_rate",
              values_fill = 0) %>%
  left_join(stops_by_nh, by = "neighborhood") %>%
  pivot_longer(c(Black:Latino), names_to = "minority_race",
               values_to = "minority_citation_rate") %>%
  rename(white_citation_rate = White)

book_rate <- rates %>%
  ungroup() %>%
  select(-citation_rate) %>%
  filter(race2 %in% c("White", "Black", "Latino"),
         !is.na(neighborhood)) %>%
  pivot_wider(names_from = "race2", values_from = "book_rate",
              values_fill = 0) %>%
  left_join(stops_by_nh, by = "neighborhood") %>%
  pivot_longer(c(Black:Latino), names_to = "minority_race",
               values_to = "minority_book_rate") %>%
  rename(white_book_rate = White)

citation_rate %>%
  ggplot(aes(x = white_citation_rate, y = minority_citation_rate)) +
    geom_point(aes(size = stops), pch = 21) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    facet_wrap(~ minority_race) +
    scale_y_continuous(labels = scales::percent_format()) +
    scale_x_continuous(labels = scales::percent_format()) +
    labs(y = "Minority Citation Rate", x = "White Citation Rate",
         title = "Comparing Citation Rates by Neighborhood")

book_rate %>%
  ggplot(aes(x = white_book_rate, y = minority_book_rate)) +
    geom_point(aes(size = stops), pch = 21) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    facet_wrap(~ minority_race) +
    scale_y_continuous(labels = scales::percent_format()) +
    scale_x_continuous(labels = scales::percent_format()) +
    labs(y = "Minority Book Rate", x = "White Book Rate",
         title = "Comparing Book Rates by Neighborhood")

For most neighborhoods, the citation rate is higher or the same for white drivers as it is for black or Latinx drivers.

For book rates though, the majority of neighborhoods, particularly those with the most stops, have lower rates for black/Latinx drivers.

This signals that officers are applying lower threshholds to minority drivers when it comes to offenses that are bookable. I think.

Mpls Police Stop Data

Richard Bamattre

2020-12-22