The USDA ERS Feed Grain Database contains among other things historical entries from the WASDE report. I need this information a lot both in teaching and research. It’s absolutely essential to maintain an accurate historical view of our most important food crops. But the data is presented in an Excel sheet that’s great for viewing, but really hard to work with and automatically analyze.
I have spent many hours manually creating dates dragging them by hand in order to create a basic chart from this dataset.
So today I wrote an R script that grabs the most recent spreadsheet imports it into R, and makes it “tidy”.
library(tidyverse)
library(readxl)
library(httr)
library(tidyr)
tmp <- tempfile(fileext = ".xls")
httr::GET(url = "https://www.ers.usda.gov/webdocs/DataFiles/50048/Feed%20Grains%20Yearbook%20Tables-All%20Years.xls",
write_disk( tmp) )
data_xl <- read_excel(tmp, sheet = "FGYearbookTable04-Full",
skip = 4,
col_names = c("MktYear", "Quarter", "BeginningStocks", "Production", "Imports",
"TotalSupply", "FoodAlcoholInd", "Seed", "FeedandResidual",
"TotalDomestic", "Exports", "TotalUse", "EndingStocks")) %>%
fill(MktYear) %>%
# Make columns for more standard date handling
mutate(month = ifelse(Quarter =="Q1 Sep-Nov", "09-01",
ifelse(Quarter == "Q2 Dec-Feb", "12-01",
ifelse(Quarter == "Q3 Mar-May", "03-01",
ifelse(Quarter == "Q4 Jun-Aug", "06-01",
ifelse(Quarter == "MY Sep-Aug", "09-01", NA)))))) %>%
mutate(year = ifelse(Quarter =="Q1 Sep-Nov" | Quarter == "Q2 Dec-Feb" | Quarter == "MY Sep-Aug", as.numeric(str_sub(MktYear, 1,4)),
as.numeric(str_sub(MktYear, 1,4)) +1 )) %>%
mutate(date = as.Date(paste0(as.character(year), "-", month))) # Now we have a proper date column.
Making the data “Tidy”
Now lets transform the data from ‘wide’ formet to ‘long’ format so we can make nice graphs with ggplot2
.
# Make tidy getting ready for ggplot
data_tidy <- data_xl %>%
select(c(-MktYear, -month, -year)) %>%
pivot_longer(c(-date, -Quarter), names_to = "UseCategory", values_to = "Value")
data_tidy
## # A tibble: 2,959 x 4
## Quarter date UseCategory Value
## <chr> <date> <chr> <dbl>
## 1 Q1 Sep-Nov 1975-09-01 BeginningStocks 558
## 2 Q1 Sep-Nov 1975-09-01 Production 5841.
## 3 Q1 Sep-Nov 1975-09-01 Imports 0.24
## 4 Q1 Sep-Nov 1975-09-01 TotalSupply 6399.
## 5 Q1 Sep-Nov 1975-09-01 FoodAlcoholInd 124.
## 6 Q1 Sep-Nov 1975-09-01 Seed NA
## 7 Q1 Sep-Nov 1975-09-01 FeedandResidual 928.
## 8 Q1 Sep-Nov 1975-09-01 TotalDomestic 1051.
## 9 Q1 Sep-Nov 1975-09-01 Exports 373.
## 10 Q1 Sep-Nov 1975-09-01 TotalUse 1424.
## # ... with 2,949 more rows
Now let’s plot the corn disappearance
# Plot the Marketing Year Data
data_tidy %>% filter(Quarter == "MY Sep-Aug") %>%
filter(UseCategory %in% c("FoodAlcoholInd", "Seed", "FeedandResidual", "Exports", "EndingStocks")) %>%
ggplot(aes(x = date, y = Value, fill = UseCategory)) +
geom_area() +
theme_bw() +
labs(x = "", y = "Millions of bushels", title = "Corn Use Categories Since 1975")
This is one of those posts I know I will come back to again and again myself. I hope you find it helpful!
This is fantastic.
Thanks!