in Uncategorized

Working with the USDA ERS Feed Grains Database #tidyverse

The USDA ERS Feed Grain Database  contains among other things historical entries from the WASDE report. I need this information a lot both in teaching and research. It’s absolutely essential to maintain an accurate historical view of our most important food crops. But the data is presented in an Excel sheet that’s great for viewing, but really hard to work with and automatically analyze.

I have spent many hours manually creating dates dragging them by hand in order to create a basic chart from this dataset.

So today I wrote an R script that grabs the most recent spreadsheet imports it into R, and makes it “tidy”.

library(tidyverse)
library(readxl)
library(httr)
library(tidyr)


tmp <- tempfile(fileext = ".xls")

httr::GET(url = "https://www.ers.usda.gov/webdocs/DataFiles/50048/Feed%20Grains%20Yearbook%20Tables-All%20Years.xls",
          write_disk( tmp) )
data_xl <- read_excel(tmp, sheet = "FGYearbookTable04-Full", 
                      skip = 4, 
                      col_names = c("MktYear", "Quarter", "BeginningStocks", "Production", "Imports", 
                                    "TotalSupply", "FoodAlcoholInd", "Seed", "FeedandResidual", 
                                    "TotalDomestic", "Exports", "TotalUse", "EndingStocks")) %>%
  fill(MktYear) %>% 
  # Make columns for more standard date handling
  mutate(month = ifelse(Quarter =="Q1 Sep-Nov", "09-01", 
                        ifelse(Quarter == "Q2 Dec-Feb", "12-01", 
                               ifelse(Quarter == "Q3 Mar-May", "03-01", 
                                      ifelse(Quarter == "Q4 Jun-Aug", "06-01",
                                             ifelse(Quarter == "MY Sep-Aug", "09-01", NA)))))) %>%
  mutate(year = ifelse(Quarter =="Q1 Sep-Nov" | Quarter == "Q2 Dec-Feb" | Quarter == "MY Sep-Aug", as.numeric(str_sub(MktYear, 1,4)), 
                       as.numeric(str_sub(MktYear, 1,4)) +1 )) %>%
  mutate(date = as.Date(paste0(as.character(year), "-", month))) # Now we have a proper date column.

Making the data “Tidy”

Now lets transform the data from ‘wide’ formet to ‘long’ format so we can make nice graphs with ggplot2.

# Make tidy getting ready for ggplot
data_tidy <- data_xl %>% 
  select(c(-MktYear, -month, -year)) %>%
  pivot_longer(c(-date, -Quarter), names_to = "UseCategory", values_to = "Value")

data_tidy
## # A tibble: 2,959 x 4
##    Quarter    date       UseCategory       Value
##    <chr>      <date>     <chr>             <dbl>
##  1 Q1 Sep-Nov 1975-09-01 BeginningStocks  558   
##  2 Q1 Sep-Nov 1975-09-01 Production      5841.  
##  3 Q1 Sep-Nov 1975-09-01 Imports            0.24
##  4 Q1 Sep-Nov 1975-09-01 TotalSupply     6399.  
##  5 Q1 Sep-Nov 1975-09-01 FoodAlcoholInd   124.  
##  6 Q1 Sep-Nov 1975-09-01 Seed              NA   
##  7 Q1 Sep-Nov 1975-09-01 FeedandResidual  928.  
##  8 Q1 Sep-Nov 1975-09-01 TotalDomestic   1051.  
##  9 Q1 Sep-Nov 1975-09-01 Exports          373.  
## 10 Q1 Sep-Nov 1975-09-01 TotalUse        1424.  
## # ... with 2,949 more rows

Now let’s plot the corn disappearance

# Plot the Marketing Year Data
data_tidy %>% filter(Quarter == "MY Sep-Aug") %>%
  filter(UseCategory %in% c("FoodAlcoholInd", "Seed", "FeedandResidual", "Exports", "EndingStocks")) %>%
  ggplot(aes(x = date, y = Value, fill = UseCategory)) + 
  geom_area() + 
  theme_bw() +
  labs(x = "", y = "Millions of bushels", title = "Corn Use Categories Since 1975")

This is one of those posts I know I will come back to again and again myself. I hope you find it helpful!

 

Leave a Reply for mindymallory Cancel Reply

Write a Comment

Comment