This tutorial accompanies the Small Town Big Data blog post Bears getting into trash cans: the data behind a mountain town rite of spring.


Trash bears and mountain towns. You can’t have one without the other.

In this tutorial, we look at when bears are getting into trash cans - and when humans are calling the cops on them. And we get to explore these patterns through a rich dataset of 2020 black bear reports from all over Colorado.

We’ll cover:

To start, I recommend downloading RStudio.

The data are available in the Github repository associated with this tutorial.

Open up a new .R file and start cutting and pasting!


Data: Colorado 2020 bear reports

Colorado Parks and Wildlife recently started keeping detailed spatial and temporal records of wildlife complaints. So far, 2020 is the only complete year of these data, but the insights are already proving useful. Each data point includes the date and time of call, what happened and what the officer found on the scene, whether food source was involved, the sex of the animal, if it was tagged and more.

The exact locations of these calls are also available, however, in the interest of privacy I’ve redacted exact home locations and addresses. While these data are technically public, it is not the interest of this blog to out the garbage-keeping practices of specific homeowners and businesses. We’re here to explore data patterns and the stories within. So instead of publishing the exact spatial data on the blog, I made an interactive map exploring the spatial patterns of bear reports using gridded heat maps I created in the R package mapdeck. That map is not part of this tutorial. If you’re interested in making your own mapdeck object using a different dataset, you can find a tutorial on that in March’s Small Town Big Data post on Social Connectedness. A heartfelt thanks to Randy Hampton at CPW for sharing these data for the purpose of this tutorial.

Time to start cleaning.

Load in and check out the dataset

# Set your working directory to the /data folder included in the repository using setwd(). 
# Install the following packages for this tutorial: 
# #install.packages(c("data.table", "lubridate", "dplyr", "tidyr", "purrr", "stringr", "ggplot2", "scales"))
library(data.table)
library(lubridate)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(ggplot2)
library(scales)


bears<-read.csv("data/bears.csv", stringsAsFactors = F)
names(bears)
##  [1] "Section.1.Header"        "Timestamp"              
##  [3] "Species"                 "Location.Header"        
##  [5] "GMU"                     "County"                 
##  [7] "District"                "Area"                   
##  [9] "Region"                  "Same.Address"           
## [11] "Incident.Header"         "Incident.Date"          
## [13] "Incident.Time"           "Complaint.Type"         
## [15] "Type.of.Property.Damage" "Service.Provided"       
## [17] "Hazing.Method"           "Animal.Data"            
## [19] "Previous.Ear.Tag."       "Ear.Tag.Side"           
## [21] "Ear.Tag.Color"           "Ear.Tag.Number"         
## [23] "Previous.PIT.tag.."      "Previous.Collar"        
## [25] "Sex"                     "Age"                    
## [27] "Color"                   "Further.Response"       
## [29] "Staff.Assigned"          "Site.Inspection.Heading"
## [31] "Violation"               "Violation.Action"       
## [33] "Citation.."              "Marked.Animal.Heading"  
## [35] "Capture.Method"          "New.Ear.Tag.Color"      
## [37] "New.Ear.Tag.Number"      "New.PIT.Tag..Dec."      
## [39] "New.Collar"              "Collar.Manufacturer"    
## [41] "Collar.Type"             "Collar.Frequency"       
## [43] "Collar.Code"             "Collar.Serial.."        
## [45] "Release.Header"          "Release.Date"           
## [47] "Release.Location"        "Seal.."                 
## [49] "Photos."                 "Photo.1"                
## [51] "Photo.2"                 "Photo.3"                
## [53] "Photo.4"                 "Photo.5"                
## [55] "Comments"                "Total.Incidents"
# Since we'll want to categorize these data by the type of complaint that was made in the report, let's look at that attribute:
unique(bears$Complaint.Type)
##  [1] "Food Source Property Damage"                                                            
##  [2] "Food Source Property Damage , Sighting"                                                 
##  [3] "Sighting"                                                                               
##  [4] "Non-Food Property Damage"                                                               
##  [5] "Food Source Property Damage , Non-Food Property Damage , Sighting"                      
##  [6] "Unsubstantiated"                                                                        
##  [7] "Food Source Property Damage , Aggressive Behavior"                                      
##  [8] "Non-Food Property Damage , Sighting"                                                    
##  [9] "Food Source Property Damage , Unsubstantiated"                                          
## [10] "Food Source Property Damage , Non-Food Property Damage"                                 
## [11] "Food Source Property Damage , Sighting , Aggressive Behavior"                           
## [12] "Sighting , Aggressive Behavior"                                                         
## [13] "Sighting , Unsubstantiated"                                                             
## [14] "Non-Food Property Damage , Unsubstantiated"                                             
## [15] "Food Source Property Damage , Non-Food Property Damage , Sighting , Aggressive Behavior"
## [16] "Attack"                                                                                 
## [17] "Non-Food Property Damage , Aggressive Behavior"                                         
## [18] "Aggressive Behavior"                                                                    
## [19] "Non-Food Property Damage , Sighting , Aggressive Behavior"                              
## [20] "Food Source Property Damage , Attack , Aggressive Behavior"                             
## [21] "Food Source Property Damage , Sighting , Unsubstantiated"                               
## [22] "Sighting , Attack"                                                                      
## [23] "Attack , Aggressive Behavior"                                                           
## [24] "Non-Food Property Damage , Attack"                                                      
## [25] "Food Source Property Damage , Aggressive Behavior , Unsubstantiated"                    
## [26] "Food Source Property Damage , Non-Food Property Damage , Aggressive Behavior"           
## [27] "Non-Food Property Damage , Sighting , Unsubstantiated"                                  
## [28] "Sighting , Aggressive Behavior , Unsubstantiated"                                       
## [29] "Food Source Property Damage , Sighting , Attack , Aggressive Behavior"
# Notice how CPW officials record several complaint types for each call - and how each call is often a list of multiple, various complaints.

# Let's make our own new category to make this easier. 

# We use a combination of pmap_chr(), str_detect() and any(), which asks each data point "does the complaint type contain these words?"
# If it does contain the target string, the case_when() function assigns it a new, more simple category we're calling newcat.  

# There's no replacement after a data point is categorized in case_when; whichever cateogry is detected first is the one used to classify. 
# That means the order of our classification in case_when() really matters here. 

# For example, if a bear was at a food source but acting aggressively, our new categorization counts that as "Aggressive Behavior", because we prioritized that classification.



b<- bears %>% mutate(newcat = pmap_chr(select(., contains("Complaint.Type")), 
                        ~ case_when(any(str_detect(c(...), "(?i)Attack")) ~ "Attack",
                                    any(str_detect(c(...), "(?i)Aggressive Behavior")) ~ "Aggressive Behavior",
                                    any(str_detect(c(...), "(?i)Food")) ~ "Food Source",
                                    any(str_detect(c(...), "(?i)Sighting")) ~ "Sighting",
                                    TRUE ~ "Other")))



unique(b$newcat)
## [1] "Food Source"         "Sighting"            "Other"              
## [4] "Aggressive Behavior" "Attack"

Dates and times with lubridate

We began working with dates back in April’s post on snowfall data. Now we’ll add in times, since these data include the time of day that parks official received the bear report. We also add in a new element: a floor date. This summarizes the data by week. We can do the same with the hour, and make data visualizations that correspond to these summary periods.

# Change date and time columns to lubridate format
b$Incident.Date<-as.Date(b$Incident.Date, format="%m/%d/%Y")
b$Incident.Time<-parse_date_time(b$Incident.Time, '%I:%M:%S %p')


# Floor week and floor hour
b$week <- floor_date(b$Incident.Date, "week")
b$hour <-floor_date(b$Incident.Time, "hour")



# Let's try to make a ggplot for bears reports over tie
ggplot(b, aes(x=week, fill=newcat)) +
  scale_fill_brewer(palette="Dark2") +
  stat_count(geom="bar", position="stack", width=6)

# Bar chart is good to go! Time to make it a circle.

Plotting polar charts

I wanted to learn how to make polar charts - sometimes called radar, spider or star charts. And that means YOU get to learn too! Turns out it’s really. Really. Really. Easy. Ggplot2 makes it (almost) as simple as adding + coord_polar() to any ggplot object. Of course there are a few nuances to labeling, but to make your first polar plot, all you do is make a bar chart into a circle!

Here, we take on a little extra cleaning and theme design work. Feel free to customize to your heart’s desire.

# This is my own personal theme object. This helps me have a consistent look and feel to various visualizations without having to cut and paste these elements into every chart. All I have to do is add my theme!

beartheme<- theme(axis.text.y = element_blank(),
      plot.title = element_text(color="grey30", size=14, face="bold", hjust=0.5),
      axis.text.x = element_text(color="grey30", size=12, face="bold"),
      legend.text = element_text(color="grey30", size=10),
      legend.title = element_text(color="grey30", size=12, face="bold"),
      panel.grid = element_line(color="grey80", size=1),
      strip.background = element_rect(fill="grey80", color="transparent"),
      legend.position = "bottom")

# Here's a polar plot of that first bar chart we made. 

all<-ggplot(b, aes(x=week, fill=newcat)) +
  scale_fill_brewer(palette="Dark2") +
  stat_count(geom="bar", position="stack", width=6) + 
  coord_polar() +
  
  # We have to tell ggplot2 that the x axis is in date format to make labelling more friendly:
  
  scale_x_date(date_breaks = "months", date_labels = "%b") +
  theme_minimal() +
  labs(x="", y="", fill="Complaint type", title = "2020 Colorado black bear reports") +
  beartheme + 
  
  # Labeling the "y" axis isn't intuitive in polar coordinates, 
  # so we use geom_text to manually add a few labels for context:
  
  geom_text(x = 250, y = 100, label = "100", color="grey30", size=4, check_overlap = TRUE) +
  geom_text(x = 250, y = 200, label = "200", color="grey30", size=4, check_overlap = TRUE) +
  geom_text(x = 250, y = 300, label = "300", color="grey30", size=4, check_overlap = TRUE)

all

# Now let's separate it out by county. Let's start by making a list of the counties with the most bear reports. 
# Then pull the reports from only those counties. We then get to use one of my favorite ggplot features: facet wrap.

# Facet wrap automatically creates small multiples of plots with equal axes based on a categorical variable - in this case, counties. 

counties<-c("Routt", "Eagle", "Pitkin", "El Paso", 
            "Boulder", "Larimer", "Grand", "Gunnison", "Jefferson")
b_c<-b[b$County %in% counties,]

counties<-ggplot(b_c, aes(x=week, fill=newcat)) +
  scale_fill_brewer(palette="Dark2") +
  stat_count(geom="bar", position="stack", width=6) + 
  coord_polar() +
  scale_x_date(date_breaks = "months", date_labels = "%b") +
  theme_minimal() +
  labs(x="", y="", fill="Complaint type", title = "2020 Colorado black bear reports") +
  beartheme + 
  geom_text(x = 250, y = 20, label = "20", color="grey30", size=3, check_overlap = TRUE) +
  geom_text(x = 250, y = 40, label = "40", color="grey30", size=3, check_overlap = TRUE) + 
  facet_wrap(~County) 

counties

# I know the bears of Steamboat (Routt County) and Vail (Eagle County) quite well.
# I immediately noticed a difference in the annual bear report patterns in these two places.
# Let's make another dataframe with just Routt and Eagle counties to get a closer look.

counties2<-c("Routt", "Eagle")
b_c2<-b[b$County %in% counties2,]

routt.eagle<-ggplot(b_c2, aes(x=week, fill=newcat)) +
  scale_fill_brewer(palette="Dark2") +
  stat_count(geom="bar", position="stack", width=6) + 
  coord_polar() +
  scale_x_date(date_breaks = "months", date_labels = "%b") +
  theme_minimal() +
  labs(x="", y="", fill="Complaint type", title = "2020 Colorado black bear reports") +
  beartheme + 
  geom_text(x = 250, y = 20, label = "20", color="grey30", size=3, check_overlap = TRUE) +
  geom_text(x = 250, y = 40, label = "40", color="grey30", size=3, check_overlap = TRUE) + 
  facet_wrap(~County)  

routt.eagle

Look at that major spike in bear sightings in the fall in Eagle County. Any theories on Eagle vs. Routt black bear activity patterns? Could these be an artifact of a difference in how humans report bears in the fall, or is it reflecting actual differences in behavior due to food sources or elevation?

What about time?

What time of day do bears get into trash? Or roam around town or growl at tourists? These data have detailed times for each report. We can plot the time using the floor hour we set above.

# Let's start with reports for all of Colorado again:

all.time<-ggplot(b, aes(x=hour, fill=newcat)) +
  scale_fill_brewer(palette="Dark2") +
  stat_count(geom="bar", position="stack") + 
  coord_polar() +
  
  # Change scale_x_date to scale_x_datetime
  
  scale_x_datetime(labels = date_format("%I:%M %p")) +
  theme_minimal() +
  labs(x="", y="", fill="Complaint type", title = "2020 Colorado black bear reports: time of complaint") +
  beartheme + 
  geom_text(x = 250, y = 100, label = "100", color="grey30", size=4, check_overlap = TRUE) +
  geom_text(x = 250, y = 200, label = "200", color="grey30", size=4, check_overlap = TRUE) +
  geom_text(x = 250, y = 300, label = "300", color="grey30", size=4, check_overlap = TRUE)


all.time

# Let's check on Routt and Eagle again to see if these bears have different sleep schedules:

routt.eagle.time<-ggplot(b_c2, aes(x=hour, fill=newcat)) +
  scale_fill_brewer(palette="Dark2") +
  stat_count(geom="bar", position="stack") + 
  coord_polar() +
  scale_x_datetime(labels = date_format("%I:%M %p")) +
  theme_minimal() +
  labs(x="", y="", fill="Complaint type", title = "2020 Colorado black bear reports: time of complaint") +
  beartheme + 
 geom_text(x = 250, y = 10, label = "10", color="grey30", size=3, check_overlap = TRUE) +
  geom_text(x = 250, y = 20, label = "20", color="grey30", size=3, check_overlap = TRUE) + 
  facet_wrap(~County)


routt.eagle.time

Interesting! It looks like Routt bears are more active (or at least trigger more reports) between 6 a.m. and noon, while a much greater share of Eagle bear reports take place between 6 p.m. and midnight. Still, even in Routt, bears are hungry at night. I can relate to that.


Thanks for coming along for the ride as we dove a little deeper into data visualization techniques and explored the temporal elements of Colorado black bear reports. If you have any questions, corrections or suggestions, please email me!.

Next post drops June 8*

*Approximately