This tutorial accompanies the Small Town Big Data blog post Wild and scenic: How the Yampa River flows.


There’s nothing like the wiggle of a free-flowing river.

The United State Geological Survey (USGS) monitors the nation’s rivers and streams with a vast system of stream gauges, offering a near-real time picture of the movement of water and sediment. The data these gauges provide a constant flow (sorry) of information about the nation’s waterways at a fine temporal resolution - you can find stream discharge data at 15 minute (!!) intervals. But you can also work with daily averages, minimums and maximums, as well as the precipitation and temperature data that accompany some of these stations. Working with USGS flow data is a staple of environmental sciences: it’s one of the first datasets I ever worked with. If you’re in any kind of watershed, landscape or conservation science, you’ll find yourself working with these data regularly, and that means getting comfortable with some efficient, automated workflows.

In this tutorial, I’ll show you how to load USGS stream gauge data directly into R. When I discovered this package, I had flashbacks to cut-and-pasting tens of thousands of rows of flow data from the web into Excel, and awful memories of cleaning it and loading it into R. Never again! Loading and plotting USGS stream gauge data is as easy as a few lines of code and involves NO clicking around web pages.

We’ll cover:

To start, I recommend downloading RStudio. The magic of dataRetrival is that there’s no need to download any tabular data files. Just open up a new .R file and start cutting and pasting!


Data: USGS stream flow data: Yampa River in Steamboat Springs

Details on the dataRetrieval package and associated codes for loading data directly into R can be found here. Feel free to work with data from any stream gauge: find the one you want here.

Let’s dive in.

Load in and check out the dataset

# Set your working directory to the /data folder included in the repository using setwd(). 
# Install the following packages for this tutorial: 
#install.packages(c("dataRetrieval", "ggplot2", "plotly", "lubridate"))

library(ggplot2)
library(plotly)
library(lubridate)
library(dataRetrieval)


# We start by assigning the objects we'll need to grab the data. Informtion on the codes can be found at the link in the paragraph above. 

siteNumber <- "09239500" #This is the unique identifier for the Yampa River in Steamboat Springs. 
parameterCd <- "00060" #This code denote 
statCd <- "00003" #This will give us the mean daily discharge in cubic feet/second
startDate <- "1980-01-01"  
endDate <- "2021-06-06" 

# We now use these objects to load the data using the readNWISdv() command. You can also load multiple parameters for multiple sites by concatenating (c()) the codes for each object. 

d <- readNWISdv(siteNumber, parameterCd, startDate, endDate)

#And...that's (mostly) it! Unreal easy right? 

# Let's simplify it a bit. We only have one station, so we just want the date and flow columns. 
d<-d[,3:4]
names(d)<-c("date", "cfs")

head(d,30)
##          date cfs
## 1  1980-01-01  74
## 2  1980-01-02  80
## 3  1980-01-03  82
## 4  1980-01-04  81
## 5  1980-01-05  79
## 6  1980-01-06  78
## 7  1980-01-07  76
## 8  1980-01-08  71
## 9  1980-01-09  77
## 10 1980-01-10  77
## 11 1980-01-11  72
## 12 1980-01-12  76
## 13 1980-01-13  80
## 14 1980-01-14  85
## 15 1980-01-15  94
## 16 1980-01-16  92
## 17 1980-01-17  91
## 18 1980-01-18  86
## 19 1980-01-19  86
## 20 1980-01-20  85
## 21 1980-01-21  80
## 22 1980-01-22  77
## 23 1980-01-23  77
## 24 1980-01-24  71
## 25 1980-01-25  74
## 26 1980-01-26  83
## 27 1980-01-27  86
## 28 1980-01-28  85
## 29 1980-01-29  90
## 30 1980-01-30  84

Plot the data with ggplot2

We visualize these data using a time-tested principle of data visualization and design: small multiples. The pioneer of this approach, Edward Tufte, describes the design as “visually enforcing comparisons of changes, of the differences among objects, the scope of alternatives.” (Envisioning Information, 1990). Tufte compares the panels to frames of a movie. We use ggplot’s facet-wrap() functionality to create multiple panels of river flow data that share consistent axes, look and feel so the eye can quickly learn the patterns, then scan and compare with ease, or as Tufte calls it, “uninterrupted visual reasoning.”

# Make a column of just the year for facetting later. 
d$year<-year(d$date)

# Make a column called "plot date" that has the month and day of the measurement, but the year is all one year (I used 2021, but it doesn't really matter). This allows us to plot the data by month and day, all on the same axis. 

d$plotdate<-as.Date(d$date, format = "%Y%m%d")
d$plotdate<-as.Date(format(d$plotdate,"2021-%m-%d"))


# Making themes: I chose a deep blue color and built my own ggplot theme object to be able to add to a ggplot object later. This theme object is fairly minimmal (no lines besides the axes) with a blue background. 
riverblue<-"#305c79"
rivertheme<-theme(rect = element_blank(),
                  panel.grid = element_blank(),
                  strip.text = element_text(color="grey30",face="bold", size=13),
                  plot.background = element_rect(fill="#CCCCCC"),
                  panel.background = element_rect(fill="#CCCCCC"),
                  axis.title = element_text(color="grey30")) 

r<-ggplot(d, aes(y=cfs)) +
  geom_line(aes(x=plotdate), color=riverblue) + 
  facet_wrap(~reorder(year, -year), ncol=1, scales="free_x") + 
  scale_x_date(date_breaks = "months", date_labels = "%b",  limits = as.Date(c('2021-01-01','2021-12-31'))) +
  rivertheme
  labs(x="Month", y="Discharge (cfs)")
## $x
## [1] "Month"
## 
## $y
## [1] "Discharge (cfs)"
## 
## attr(,"class")
## [1] "labels"
r

# And just for fun, a completely bare, minimalist chart. Because I like them. Because the wiggles of the data remind me a lot of the river itself. 

void<-ggplot(d, aes(y=cfs)) +
  geom_line(aes(x=plotdate), color=riverblue) + 
  facet_wrap(~reorder(year, -year), ncol=1, scales= "free_x") + 
  scale_x_date(date_breaks = "months", date_labels = "%b",  limits = as.Date(c('2021-01-01','2021-12-31'))) +
  theme_void() +theme(strip.text = element_blank())

void


Interactive HTML charts with Plotly

Plotly is one of my favorite open source data visualization tools. Plotly for R allows you to create html widgets that are inherently interactive: they automatically enable panning, zooming and hover info actions, with a built-in navigation bar that also lets users save static versions of the chart. Built on the Plotly.js library, these widgets also open up a world of opportunity for connecting multiple plots and encoding reactions to user events like clicking, hovering and selecting.

Neatly built into R’s plotly library is the function ggplotly() - a simple wrapper for plotly-fying any ggplot object. The conversion isn’t seamless, but it does offer a low-overhead starting point for interactive visualizations. Here’s a quick example of ggplotly() in action, and how to custom code hover text.

#To make a plotly chart that allows the user to hover over any day and see the mean discharge, we need to add a new geom into the mix. We don't need to see the points (I made them super small), but the added aes argument "text" is the magic here. The text argument is useless in the ggplot object itself, but when wrapped into ggplotly(), it becomes the code for custom hover text. You can use a combination of regular old HTML, strings and data columns to create your hover text. 


p<-ggplot(d, aes(y=cfs)) +
  geom_point(aes(x=plotdate, text=paste('Date:', date, '<br>Discharge (cfs):', cfs)), color=riverblue,size=0.01) +
  geom_line(aes(x=plotdate), color=riverblue) + facet_wrap(~reorder(year, -year), ncol=1, scales= "free_x") + 
  scale_x_date(date_breaks = "months", date_labels = "%b",  limits = as.Date(c('2021-01-01','2021-12-31'))) +
  rivertheme + 
  labs(x="Month", y="Discharge (cfs)")

g<-ggplotly(p, tooltip='text', autosize = F, width = 750, height = 7500)


# This is a quick loop I found on StackOverflow that fixes the locations of the year labels for each facet. I'm still not sure how to do this in the ggplotly() command itself, but this fix works quite well: 

for (k in 3:44) {
  g$x$layout$annotations[[k]]$x <- .8
  g$x$layout$annotations[[k]]$y <- g$x$layout$annotations[[k]]$y - 0.013
  
}


#And that's all you need for a clickable, hoverable, panable interactive chart! 
g

Interactive Plotly charts are a wonderful tool for surprising/impressing your teachers, clients or bosses. There’s so much more to Plotly including connecting multiple subplots, highlighting and brushing actions. Check out its full potential here I hope this tutorial gives you the wings you need take off on your interactive data visualization journey. If you have any questions, corrections or suggestions, please email me!.

Next post drops July 13*

*Approximately