This tutorial accompanies the Small Town Big Data blog post Towns, they are a’changin’. Read it here.
This tutorial offers an introduction to working with National Land Cover Database rasters in R. We’ll cover:
Software: If you’re a beginner, I recommend downloading RStudio. Open up a new .R file and start cutting and pasting!
Data: You can peruse and download NLCD data here
The projection is a little funky, and cropping and clipping to a county boundary can take a lot of processing time. To give you a little head start, I’ve projected, clipped and trimmed seven timesteps of Routt County NLCD for you, and included it in this repository. But feel free to download your own area of interest and let me know what you find!
Let’s jump right in.
# Set your working directory to the /data folder included in the repository using setwd().
# Install the following packages for this tutorial:
# install.packages(c("FedData", "rasterVis", "raster","sp", "rgdal", "reshape2", "treemapify", "ggplot2", "kableExtra", "animation" "scales"))
library(FedData)
library(rasterVis)
library(raster)
library(sp)
library(rgdal)
library(reshape2)
library(treemapify)
library(ggplot2)
library(kableExtra)
library(animation)
library(scales)
# Load a single raster using the raster() function This is the
r<-raster("nlcd_1.tif")
# What does a raster object look like in R? What are the key attributes of this data format?
r
## class : RasterLayer
## dimensions : 4311, 2513, 10833543 (nrow, ncol, ncell)
## resolution : 30.1, 29.6 (x, y)
## extent : 289422, 365063.3, 4417034, 4544639 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs
## source : C:/Users/ncinglis/Desktop/stbd/nlcd/nlcd_1.tif
## names : nlcd_1
## values : 11, 95 (min, max)
# We have seven different years of NLCD data. The stack() function takes as many rasters as you give it and stacks them into one object.
# If they have the same resolution, extent and spatial reference, this is an easy way to conduct operations on multiple layers at the same time.
# We can load in the NLCD series straight into a stack.I do this so often that I wrote a little function that takes a regular expression and creates a raster stack from the files that match it.
list2stack <- function(x) {
z<-list.files(pattern = x)
y<-raster::stack(z)
return(y)
}
# If you want to get into regular expressions, there are entire books on it. Just trust me this one works:
p<-"^nlcd_.*\\.tif$"
s<-list2stack(p)
#Let's plot this stack and take a look at it.
plot(s)
National Land Cover Database has a standardized legend with specific colors that correspond to land cover classes. We want to visualize Routt County land cover in a manner consistent with other NLCD maps. This isn’t exactly trivial in R, but with a few steps, we can plot a land cover map with code that we can use forever and ever!
# Load in the NLCD legend, colors and descriptions from package FedData.
legend<-pal_nlcd()
legend
## class code description color
## 1 water 11 Open Water #476BA0
## 2 water 12 Perennial Ice/Snow #D1DDF9
## 3 developed 21 Developed, Open Space #DDC9C9
## 4 developed 22 Developed, Low Intensity #D89382
## 5 developed 23 Developed, Medium Intensity #ED0000
## 6 developed 24 Developed, High Intensity #AA0000
## 7 barren 31 Barren Land (Rock/Sand/Clay) #B2ADA3
## 8 forest 41 Deciduous Forest #68AA63
## 9 forest 42 Evergreen Forest #1C6330
## 10 forest 43 Mixed Forest #B5C98E
## 11 shrubland 51 Dwarf Scrub #A58C30
## 12 shrubland 52 Scrub/Shrub #CCBA7C
## 13 herbaceous 71 Grassland/Herbaceous #E2E2C1
## 14 herbaceous 72 Sedge/Herbaceuous #C9C977
## 15 herbaceous 73 Lichens #99C147
## 16 herbaceous 74 Moss #77AD93
## 17 planted 81 Pasture/Hay #DBD83D
## 18 planted 82 Cultivated Crops #AA7028
## 19 wetlands 90 Woody Wetlands #BAD8EA
## 20 wetlands 95 Emergent Herbaceous Wetlands #70A3BA
# Love that. I wish I had found FedData 4 years ago.
# Now, we make a vector of all the values we have in our study area and select those from the legend object.
vals<-unique(s[[6]])
df<-legend[legend$code %in% vals,]
# Alright, now let's make this plot pretty, working with 2016 data. We're going to use leveplot() from package rasterVis. First, we need to recognize it as a categorical raster using ratify():
rat<-ratify(s[[6]])
# I used some code from the creator of rasterVis to make a custom legend:
myKey <- list(rectangles=list(col = df$color),
text=list(lab=df$description),
space='left',
columns=1,
size=2,
cex=.6)
# And here we plot it minus the axes, ticks and labels.
levelplot(rat, att='ID',
col.regions=df$color,
par.settings = list(axis.line = list(col = "transparent"),
strip.background = list(col = 'transparent'),
strip.border = list(col = 'transparent')),
scales = list(col = "transparent"),
colorkey=F,
key=myKey)
# Gorgeous. Let's animate it.
We can animate a series of rasters to show land cover change over time. All we need to do is take that levelplot() function and loop through each layer of our raster stack. We use the package animation
to save as a GIF.
# Make a list of years in string format for titles
years.list<-list("2001", "2004", "2006", "2008", "2011", "2013", "2016")
saveGIF(
{
for(i in c(1:nlayers(s))){
rat<-ratify(s[[i]])
a<-levelplot(rat, att='ID',
col.regions=df$color,
par.settings = list(axis.line = list(col = "transparent"),
strip.background = list(col = 'transparent'),
strip.border = list(col = 'transparent')),
scales = list(col = "transparent"),
main=paste0("Routt County land cover ", years.list[[i]]),
colorkey=F,
)
print(a)
}
}, interval=0.8, movie.name="szoom.gif", ani.width = 600)
# Check your working directory for the gif file. It should look like this:
Let’s run some numbers.
Raster is faster. The beauty of working with rasters is that a raster object functions just like a regular dataframe. It essentially has three attributes for each data point: An x position, a y position and the value itself. But stored as a single gridded image, it’s much more friendly to work with than a vector file. It may look like a pixelated generalization of realiy. It kind of is. But this image is soaked with information - and, we can do math on it.
So let’s explore some basics of Routt County land cover change with the information embedded in our NLCD raster stack.
And in the process, we’re going to learn a handy technique that you saw in the animation section: the for loop.
# We have this data frame of NLCD classes:
df
## class code description color
## 1 water 11 Open Water #476BA0
## 2 water 12 Perennial Ice/Snow #D1DDF9
## 3 developed 21 Developed, Open Space #DDC9C9
## 4 developed 22 Developed, Low Intensity #D89382
## 5 developed 23 Developed, Medium Intensity #ED0000
## 6 developed 24 Developed, High Intensity #AA0000
## 7 barren 31 Barren Land (Rock/Sand/Clay) #B2ADA3
## 8 forest 41 Deciduous Forest #68AA63
## 9 forest 42 Evergreen Forest #1C6330
## 10 forest 43 Mixed Forest #B5C98E
## 12 shrubland 52 Scrub/Shrub #CCBA7C
## 13 herbaceous 71 Grassland/Herbaceous #E2E2C1
## 17 planted 81 Pasture/Hay #DBD83D
## 18 planted 82 Cultivated Crops #AA7028
## 19 wetlands 90 Woody Wetlands #BAD8EA
## 20 wetlands 95 Emergent Herbaceous Wetlands #70A3BA
# And, using the freq() function, we can count the number of cells in each class.
freq(s[[1]])
## value count
## [1,] 11 21074
## [2,] 12 282
## [3,] 21 37603
## [4,] 22 14354
## [5,] 23 3741
## [6,] 24 1082
## [7,] 31 12637
## [8,] 41 2118121
## [9,] 42 1575853
## [10,] 43 224290
## [11,] 52 2372042
## [12,] 71 116720
## [13,] 81 205291
## [14,] 82 26121
## [15,] 90 95329
## [16,] 95 59609
## [17,] NA 3949394
# So our goal: create a data frame (a table) where each row is a land cover class, and there is one column for each time step with the pixel count for each class. I find the greatest challenge of building a for loop is conceptualizing its structure.
# So I usually start by experimenting and finding a method that works on one layer:
test<-merge(df, freq(s[[1]]), by.x="code" ,by.y="value", all.y=F, all.x=T)
names(test)[ncol(test)]<-2001
test
## code class description color 2001
## 1 11 water Open Water #476BA0 21074
## 2 12 water Perennial Ice/Snow #D1DDF9 282
## 3 21 developed Developed, Open Space #DDC9C9 37603
## 4 22 developed Developed, Low Intensity #D89382 14354
## 5 23 developed Developed, Medium Intensity #ED0000 3741
## 6 24 developed Developed, High Intensity #AA0000 1082
## 7 31 barren Barren Land (Rock/Sand/Clay) #B2ADA3 12637
## 8 41 forest Deciduous Forest #68AA63 2118121
## 9 42 forest Evergreen Forest #1C6330 1575853
## 10 43 forest Mixed Forest #B5C98E 224290
## 11 52 shrubland Scrub/Shrub #CCBA7C 2372042
## 12 71 herbaceous Grassland/Herbaceous #E2E2C1 116720
## 13 81 planted Pasture/Hay #DBD83D 205291
## 14 82 planted Cultivated Crops #AA7028 26121
## 15 90 wetlands Woody Wetlands #BAD8EA 95329
## 16 95 wetlands Emergent Herbaceous Wetlands #70A3BA 59609
# OK, looks good. Now, we want to do this for each year. We could copy and paste and do the above for all seven layers.
# But what if we had 100 layers? What happens when we want to repeat this task on another dataset with different names?
# We need to automate, and we do that by looping through each layer to perform the task we wrote above.
# we'll make a vector of all the years in our study period, then loop through the positions (1-7), merging the frequency information into the data frame and naming the column after that year.
years<-c(2001, 2004, 2006, 2008, 2011, 2013, 2016)
d<-df
for (i in 1:length(years)) {
d<-merge(d, freq(s[[i]]), by.x="code" ,by.y="value", all.y=F, all.x=T)
names(d)[ncol(d)]<-paste0("pix_", years[[i]])
}
# Now, a few percent change calculations:
d$square.mile.change<-(d$pix_2016 - d$pix_2001) * 30 * 30 * 0.00000038610
d$percentchange<-(d$pix_2016 - d$pix_2001)/d$pix_2001
d$prop2016<-d$pix_2016/sum(d$pix_2016)
d$percentchange.2001.2016<-paste(round(100*d$percentchange, 2), "%", sep="")
d$percent.area.2016<-paste(round(100*d$prop2016, 2), "%", sep="")
# Make it a pretty table with kable:
kable(d[,c(3,12,15,16)]) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
description | square.mile.change | percentchange.2001.2016 | percent.area.2016 |
---|---|---|---|
Open Water | 1.0598445 | 14.47% | 0.35% |
Perennial Ice/Snow | 0.0000000 | 0% | 0% |
Developed, Open Space | 0.3301155 | 2.53% | 0.56% |
Developed, Low Intensity | 0.2567951 | 5.15% | 0.22% |
Developed, Medium Intensity | 0.2536677 | 19.51% | 0.06% |
Developed, High Intensity | 0.0757528 | 20.15% | 0.02% |
Barren Land (Rock/Sand/Clay) | -0.0368339 | -0.84% | 0.18% |
Deciduous Forest | -2.0877199 | -0.28% | 30.68% |
Evergreen Forest | -32.0427479 | -5.85% | 21.55% |
Mixed Forest | -0.4566019 | -0.59% | 3.24% |
Scrub/Shrub | 2.9488001 | 0.36% | 34.58% |
Grassland/Herbaceous | 24.8292030 | 61.22% | 2.73% |
Pasture/Hay | -1.3180296 | -1.85% | 2.93% |
Cultivated Crops | 6.5727734 | 72.41% | 0.65% |
Woody Wetlands | -0.0865250 | -0.26% | 1.38% |
Emergent Herbaceous Wetlands | -0.2984939 | -1.44% | 0.85% |
# And just for fun, aggregate it by class for another perspective
dfc<-aggregate(prop2016 ~ class, d[,c(2,13,14)], sum)
dfc$prop2016<-paste(round(100*dfc$prop2016, 2), "%", sep="")
kable(dfc) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
class | prop2016 |
---|---|
barren | 0.18% |
developed | 0.86% |
forest | 55.47% |
herbaceous | 2.73% |
planted | 3.58% |
shrubland | 34.58% |
water | 0.35% |
wetlands | 2.23% |
For years, researchers in perception and data visualization have warned about the dangers of pie charts. The bottom line is this: Humans aren’t great at comparing the sizes of angular shapes, such as the slice of a pie. So if two pie slices are close together in size, it’s hard for us to tell which one is bigger.
Enter the treemap. Because you know what is easier for our eyes to compare? Rectangles. It’s that simple. Pie charts, but with rectangles. Treemaps are especially useful for visualizing hierarchical data. Here, we’ll look at Routt County land cover proportions with treemaps. The package treemapify
enables plotting them through Ggplot. Next time someone asks you to make a pie chart, I hope you consider making one of these instead! Let’s make treemaps happen in mainstream data viz.Â
# To start, we need to shift the data frame from "wide" format to "long" format, because GGplot likes it that way.
head(d)
## code class description color pix_2001 pix_2004 pix_2006
## 1 11 water Open Water #476BA0 21074 21610 22257
## 2 12 water Perennial Ice/Snow #D1DDF9 282 282 282
## 3 21 developed Developed, Open Space #DDC9C9 37603 37603 38028
## 4 22 developed Developed, Low Intensity #D89382 14354 14354 14727
## 5 23 developed Developed, Medium Intensity #ED0000 3741 3741 3995
## 6 24 developed Developed, High Intensity #AA0000 1082 1082 1124
## pix_2008 pix_2011 pix_2013 pix_2016 square.mile.change percentchange
## 1 22630 22660 22828 24124 1.05984450 0.14472810
## 2 282 282 282 282 0.00000000 0.00000000
## 3 38028 38472 38472 38553 0.33011550 0.02526394
## 4 14727 15020 15020 15093 0.25679511 0.05148391
## 5 3995 4382 4382 4471 0.25366770 0.19513499
## 6 1124 1269 1269 1300 0.07575282 0.20147874
## prop2016 percentchange.2001.2016 percent.area.2016
## 1 3.504282e-03 14.47% 0.35%
## 2 4.096367e-05 0% 0%
## 3 5.600256e-03 2.53% 0.56%
## 4 2.192428e-03 5.15% 0.22%
## 5 6.494630e-04 19.51% 0.06%
## 6 1.888396e-04 20.15% 0.02%
dn<- melt(d, id.vars = 1:4)
# Check out what that did - see how all of the land cover types are in one column now?
dn$value<-as.numeric(dn$value)
## Warning: NAs introduced by coercion
head(dn)
## code class description color variable value
## 1 11 water Open Water #476BA0 pix_2001 21074
## 2 12 water Perennial Ice/Snow #D1DDF9 pix_2001 282
## 3 21 developed Developed, Open Space #DDC9C9 pix_2001 37603
## 4 22 developed Developed, Low Intensity #D89382 pix_2001 14354
## 5 23 developed Developed, Medium Intensity #ED0000 pix_2001 3741
## 6 24 developed Developed, High Intensity #AA0000 pix_2001 1082
# For this first treemap, we want to summarize the land cover area by class, not individual description. So we need to aggregate by adding all of the values for each class together.
dc<-aggregate(value ~ class, dn[dn$variable=="pix_2016",], sum)
head(dc)
## class value
## 1 barren 12531
## 2 developed 59417
## 3 forest 3818730
## 4 herbaceous 188173
## 5 planted 246534
## 6 shrubland 2380528
# Here are the colors from the palette dataframe in the order we'll need them for this plot. I also made a vector of labels so that I could format them exactly the way I want.
class_pal<-c("#B2ADA3", "#ED0000", "#68AA63", "#B5C98E", "#DBD83D", "#CCBA7C", "#476BA0", "#BAD8EA" )
class_labels<-c("barren", "developed", "forest", "grassland", "agriculture", "shrubland", "water", "wetlands")
# Treemaps are pretty straightforward in Ggplot. The relevant aesthetics are area, fill and label.
# Area is the size of the box - assign that to the area value in your dataframe.
# Fill is the color of the box. In this example we want the color to correspond to land cover class.
# Assign the label vector from above to the label aesthetic.
c<-ggplot(dc, aes(area=value, fill=class, label=class_labels)) +
geom_treemap() +
geom_treemap_text(grow=T, color="grey20", place="bottomleft")+
scale_fill_manual(values=class_pal)+
theme(legend.position = "none")
c
# Treemaps are ideal for hierarchical data. In this instance, we want to keep the colors for the aggregated classes, but break each box down by land cover description. We do this by adding the "subgroup" argument.
t<-ggplot(dn[dn$variable=="pix_2016",], aes(area=value, fill=class, subgroup=class, label=tolower(description))) +
geom_treemap() +
geom_treemap_text(grow=T, color="grey20", place="bottomleft", reflow=T) +
scale_fill_manual(values=class_pal)+
theme(legend.position = 'none')+
geom_treemap_subgroup_border(color="grey80", size=2)
t
We went over a LOT in this tutorial. Basic raster functions, levelplots, for loops, animations and treemaps. If you have any questions at all, feel free to email me!
Next post drops March 2.