*This is the second in a series of posts where I document my own process in figuring out how machine learning relates to the classic econometrics one learns in a graduate program in economics. Note that I am a humble practitioner of applied econometrics, so the ideas I am working through are not new, or may first take on them might not even be fully correct. But, I think many of us applied economists are starting to learn and dabble in this area and I thought it might be fruitful to community if I learn publicly in these blog posts. These posts are certainly going to solidify my own understanding and serve as helpful notes for my own purposes.*

I had at least three people reach out after my first machine learning post – two in academic positions saying that they are starting to use these methods in their own research and one former classmate in the corporate world who reached out to say these methods are already deeply embedded in their business. Further I saw @DrKeIthHCoble, Ashok Mishra, @shanferrell, and @SpacePlowboy just dropped an AEPP article on the topic, Big Data in Agriculture: A Challenge for the Future. I’m even more convinced now is a really great time to invest in learning this machine learning stuff. At the end I do an Ag example by fitting models of the relationship between corn prices received by farmers and stocks to use.

# Flexibility versus Interpretability

I started going through An Introduction to Statistical Learning, and the first thing that struck me is that linear regression gets two whole chapters devoted to it (Chapters 3 and 6). I was a bit surprised because my naive view was that linear regression was something primitive and completely separate from the concepts of machine learning, whereas in this book, linear regression is presented as a baseline or starting place for the concepts of machine learning. This left me with a question. Why is econometrics as usually presented as something separate methodologically from machine learning?

In chapter 2, I think I got the answer to my question. The authors note that there is always a tension between flexibility and interpret ability of a model. Linear regression is quite inflexible, but it is readily interpret-able. Non-linear models can be quite complex in how an independent variable is functionally related to the dependent variable. In economics, we care about interpretability. We go through great pains to find instruments, design experiments, get random samples, in order to argue causality of one variable on another. Inference and interpretability matter a great deal to us, so inflexible models are often attractive for this purpose. Additionally, less flexible models often perform fairly well compared to more flexible models in terms of predictive accuracy, at least in out of sample testing, due to the problem of over-fitting in flexible models. So you get a lot of mileage out of an inflexible model like linear regression, and more flexible models might not perform much better on medium data anyway.

Our analysis is often naturally constrained in terms of the number of independent variables one might include. First, economic theory often suggests what things might be important independent variables, so there is less need to throw everything into a regression and see if it predicts. Or, you might be constrained by the scarcity of relevant data. If you conduct a survey, you can only ask so many questions and then you only have that many potential independent variables. To use a concept from my first machine learning post, we often have ‘medium data’ not big data.

I think there are good reasons why econometrics has gravitated to linear regression models as our bread and butter.

Econometrics:

- We care very much about inference, less emphasis on prediction \(\Rightarrow\) Linear regression often works well
- We often have medium data \(\Rightarrow\) Linear regression often works well.

So does that mean we should scoff at this statistical learning stuff? No I don’t think so. Reading into chapter 5 (Re-sampling Methods) and chapter 6 (Linear Model Selection and Regularization), I think there are really nice opportunities to get a lot of benefit even in a fairly inflexible model and even with medium data. Chapter 5 covers re-sampling methods including Cross-Validation and Bootstrap methods. Bootstrap is ubiquitous in econometrics, but cross-validation could be successfully utilized more, I think. Among those of us working with large ‘medium data’, like the Census or Ag Census, the methods are chapter 6 are fairly commonly employed, I think.

Statistical Learning:

- More focus on prediction \(\Rightarrow\) Larger baseline set of models to choose from. Just so long as it predicts, who cares about causality!
- Means much more effort exerted at the model selection stage.

Chapters 2 and 5 discuss the bias-variance trade-off in model selection and the k-Fold cross validation method of model selection, respectively. I briefly summarize these concepts next.

# Bias-Variance Tradeoff

A key to mastering model assessment and model selection is to understand the bias-variance trade-off. If you have a model, \(y = f(x) + \epsilon\), so that \(y\) is a function of \(f\) plus some noise. Then, if we want to find out how \(x\) affects \(y\), or use known values of \(x\) to predict \(y\), we need to find a function \(\hat{f}(x)\) that is as close as possible to \(f\). The most common way to assess if our \(\hat{f}\) is good or not is with mean squared error (MSE), or root mean squared error where we take the square root of the MSE. MSE is defined like this.

\[MSE = \frac{1}{n}\sum_{i=1}^n (y_i – \hat{f}(x_i))^2 \]

It just takes all the model’s predictions, \(\hat{f}(x_i)\), calculates the square of how far they are from the actual value, \(y_i\), and averages them. So a low MSE is good – your predictions were not far away from the actual values.

So what contributes to a model’s errors? Head over to Wikipedia for a proof, but for any particular observation the expectation of the terms contributing to the MSE can be broken down as follows,

\[E(y_i – \hat{f}x_i)^2 = Var(\hat{f}(x_i)) + Bias(\hat{f}(x_i))^2 + Var(\epsilon_i),\]

where, \(Bias(\hat{f}(x_i)) = E(\hat{f}(x_i) – f(x_i))\). Clearly to get a low MSE you need a model that will give you low variance and low bias. But, typically these things move in opposite directions. You get a high variance in \(\hat{f}\) with a more flexible model because when a model is flexible (think for example of a high degree polynomial versus a linear model where the polynomial has many more parameters to estimate) different training data sets will make the fitted function ‘wiggle’ around much more than the linear model. Conversely, models that are less flexible have higher bias, because it can not accommodate the true shape of the data generating process as easily as a more flexible model.

In ISL, figure 2.11 illustrates this very well.

*This figure is taken from “An Introduction to Statistical Learning, with applications in R” (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani*

Here you see a scatter-plot of a data set with variables X and Y as circles in the left panel. To model the data they fit a linear model (in gold), and increasingly higher degree polynomials in black, blue, and green. Of course, the higher the degree of the polynomial the more ‘wiggly’ the \(\hat{f}\). Then in the right panel, they plot the MSE of these different candidate \(\hat{f}\)’s. The training set MSE is in grey and the test set (out of sample in the parlance of price forecasting) MSE is in red.

This illustrates an important point. Yes, variance and bias of a model move in opposite directions, but not necessarily at the same rate (as flexibility increases I mean). Here, when you move from a linear, to a 3rd degree, to a 5th degree polynomial, you get dramatic reductions in the MSE of both the training set and the test set.

This is because the reduction in bias as you move away from the linear model overwhelms the increase in variance as you move toward a higher degree polynomial. Though, after about a 5th degree polynomial, the gains disappear and you do not get much more out of adding higher polynomial terms to the \(\hat{f}\). So, if this were a model selection exercise in a real research problem, we would probably select a 5th degree polynomial.

The approach of using MSE for model selection seems to be pretty ubiquitous in machine learning contexts. In economics, it is commonly used to select among competing forecasting models, but I don’t think it is used that widely for model selection in models where inference is the goal.

# Cross Validation

Cross validation is a technique for model selection that I do not believe is widely used by economists, but I think it has a lot of promise. It does not seem to me to require a particularly large data set, so it could be an effective tool even on our ‘medium’ data sets.

The idea behind it is pretty simple. At the model selection stage, you will have a number of candidate models. Conceptually, the example above is pretty simple, your candidate models are polynomials of higher and higher degree. But, you could have a number of candidate models that look very different from one another. The MSE approach to model selection is to just see which one predicts the best on a ‘hold out’ data set. That is basically what we did in the example shown in figure 2.11.

Cross Validation just takes this one step further. In k-Fold cross validation you do the MSE exercise with k different randomly selected hold out samples for every model you are considering. An illustration for a 5-Fold cross validation sample split is shown in figure 5.5 from the ISL book.

*This figure is taken from “An Introduction to Statistical Learning, with applications in R” (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani*

The advantage of this is that it allows the variance and bias of the fitted models to be revealed a little more completely. By using different subsets of your data as the test set you get a sense of how much individual observations are moving around your MSE estimates. When conducting a k-Fold cross validation, you typically just average the MSE’s given by each of the k model estimates.

# An Ag Example

To illustrate the potential of k-Fold cross validation in agricultural economics I will use it in a simple example on a pretty small data set. Every month the USDA puts out the World Agricultural Supply and Demand Estimates (WASDE). This report is followed very closely, especially during the growing season and harvest of commodities important to U.S. agriculture. It publishes what are called ‘balance sheets’ because they state in a table expectations for supply and demand from various sources for each commodity.

For store-able commodities the WASDE also have an estimate of ending stocks, the amount of the grain that is not consumed in the same marketing year it was grown in, and thus carried over to be consumed during the next marketing year. Ending stocks are an important indicator or scarcity or surplus, and thus correlate with prices quite nicely. Rather than simply use ending stocks, a ratio called \(Stocks\) \(to\) \(Use\) has come to be important in price analysis because it allows ending stocks to be expressed in a way that is not influenced by the (potentially growing) size of the market.

\[Stocks \text{ } to \text{ } Use_t = \frac{Ending \text{ } Stocks_t}{Total \text{ } Use_t} \]

Stocks to Use is simply the ratio of ending stocks to the total amount of the commodity ‘used’.

## The Data

Data from historical WASDE reports can be retrieved from the USDA’s Feed Grains Database. If you want to follow along, download the .csv file containing ending stocks for corn in the the U.S. and the rest of the world (ROW), as well as average prices received by farmers in the U.S. The following code imports the .csv file (make sure to modify the file path in `read.csv()`

to wherever you put the stocks.csv file if you are following along). Then the code creates the stocks to use variable in the line that calls the `mutate()`

function.

```
# If you haven't installed any of the packages listed below, do so with the "install.packages('tibble')" command.
library(tibble)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(kableExtra)
library(gridExtra)
library(boot)
stocks <- read.csv('images/stocks.csv')
stocks <- as_tibble(stocks)
stocks <- mutate(stocks, USStockUse = USEndingStocks/USTotalUse, WorldStockUse = ROWEndingStocks/WorldTotalUse)
stocks %>% kable(caption = "U.S. and World Ending Stocks, Total Use, Prices Recieved by Farmers, and Stocks to USE, 1975-2017", "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
```

Year | USEndingStocks | ROWEndingStocks | USTotalUse | WorldTotalUse | PriceRecievedFarmers | USStockUse | WorldStockUse |
---|---|---|---|---|---|---|---|

1975 | 633.200 | 36411 | 5767.054 | 228199 | 2.54 | 0.1097961 | 0.1595581 |

1976 | 1135.600 | 39491 | 5789.200 | 235034 | 2.15 | 0.1961584 | 0.1680225 |

1977 | 1435.900 | 40833 | 6207.139 | 246973 | 2.02 | 0.2313304 | 0.1653339 |

1978 | 1709.500 | 47957 | 6995.479 | 254029 | 2.25 | 0.2443721 | 0.1887855 |

1979 | 2034.300 | 59481 | 7604.060 | 273641 | 2.52 | 0.2675281 | 0.2173687 |

1980 | 1392.100 | 67180 | 7282.444 | 293625 | 3.11 | 0.1911584 | 0.2287952 |

1981 | 2536.600 | 62725 | 6974.706 | 290686 | 2.50 | 0.3636856 | 0.2157827 |

1982 | 3523.100 | 60273 | 7249.090 | 279377 | 2.55 | 0.4860058 | 0.2157407 |

1983 | 1006.300 | 63421 | 6692.759 | 287033 | 3.21 | 0.1503565 | 0.2209537 |

1984 | 1648.200 | 76287 | 7031.963 | 297524 | 2.63 | 0.2343869 | 0.2564062 |

1985 | 4039.522 | 75069 | 6494.029 | 285517 | 2.23 | 0.6220363 | 0.2629230 |

1986 | 4881.693 | 80862 | 7385.350 | 298388 | 1.50 | 0.6609968 | 0.2709962 |

1987 | 4259.086 | 89488 | 7757.318 | 304440 | 1.94 | 0.5490410 | 0.2939430 |

1988 | 1930.428 | 96217 | 7260.121 | 319881 | 2.54 | 0.2658947 | 0.3007900 |

1989 | 1344.457 | 98715 | 8119.826 | 327783 | 2.36 | 0.1655771 | 0.3011596 |

1990 | 1521.245 | 102761 | 7760.655 | 319954 | 2.28 | 0.1960202 | 0.3211743 |

1991 | 1100.311 | 113106 | 7915.336 | 332232 | 2.37 | 0.1390100 | 0.3404428 |

1992 | 2112.981 | 109068 | 8471.119 | 339172 | 2.07 | 0.2494335 | 0.3215714 |

1993 | 850.143 | 107849 | 7621.383 | 349304 | 2.50 | 0.1115471 | 0.3087540 |

1994 | 1557.840 | 113777 | 9352.380 | 353437 | 2.26 | 0.1665715 | 0.3219159 |

1995 | 425.942 | 122467 | 8548.436 | 376204 | 3.24 | 0.0498269 | 0.3255335 |

1996 | 883.161 | 143886 | 8788.599 | 382278 | 2.71 | 0.1004894 | 0.3763910 |

1997 | 1307.803 | 133982 | 8791.000 | 388191 | 2.43 | 0.1487661 | 0.3451445 |

1998 | 1786.977 | 145972 | 9298.317 | 395867 | 1.94 | 0.1921828 | 0.3687400 |

1999 | 1717.549 | 150779 | 9514.784 | 412536 | 1.82 | 0.1805137 | 0.3654930 |

2000 | 1899.108 | 126880 | 9740.316 | 412710 | 1.85 | 0.1949740 | 0.3074314 |

2001 | 1596.426 | 110808 | 9815.402 | 424591 | 1.97 | 0.1626450 | 0.2609759 |

2002 | 1086.673 | 99276 | 9490.986 | 427821 | 2.32 | 0.1144953 | 0.2320503 |

2003 | 958.091 | 80334 | 10229.950 | 438453 | 2.42 | 0.0936555 | 0.1832215 |

2004 | 2113.972 | 77364 | 10660.530 | 465941 | 2.06 | 0.1982990 | 0.1660382 |

2005 | 1967.161 | 73472 | 11267.804 | 475772 | 2.00 | 0.1745825 | 0.1544269 |

2006 | 1303.647 | 75639 | 11206.620 | 499570 | 3.04 | 0.1163283 | 0.1514082 |

2007 | 1624.150 | 86418 | 12737.393 | 515187 | 4.20 | 0.1275104 | 0.1677410 |

2008 | 1673.311 | 100739 | 12007.572 | 526638 | 4.06 | 0.1393547 | 0.1912870 |

2009 | 1707.787 | 97474 | 13041.023 | 546250 | 3.55 | 0.1309550 | 0.1784421 |

2010 | 1127.645 | 93924 | 13033.141 | 570041 | 5.18 | 0.0865214 | 0.1647671 |

2011 | 989.027 | 103071 | 12481.947 | 607875 | 6.22 | 0.0792366 | 0.1695595 |

2012 | 821.185 | 112089 | 11082.899 | 606563 | 6.89 | 0.0740948 | 0.1847937 |

2013 | 1231.904 | 142987 | 13454.039 | 661861 | 4.46 | 0.0915639 | 0.2160378 |

2014 | 1731.164 | 165766 | 13747.918 | 686099 | 3.70 | 0.1259219 | 0.2416065 |

2015 | 1737.058 | 170837 | 13663.635 | 669447 | 3.61 | 0.1271300 | 0.2551912 |

2016 | 2293.303 | 171509 | 14648.801 | 747315 | 3.36 | 0.1565523 | 0.2295003 |

2017 | 2352.370 | 143333 | 14595.000 | 749749 | 3.30 | 0.1611764 | 0.1911746 |

Source: USDA Feedgrains Database

This is clearly not big data if I can print the whole data set right in the blog post! We have 43 years of data. Note that the 2017 observation is an expectation. We are not sure what carryout for the 2017 marketing year will be. The U.S. ending stocks and total use are in millions of bushels. The World ending stocks and total use are in 1,000 MT.

Now lets just plot the U.S. and World stocks to use versus price to get a sense of what the relationship looks like.

```
us <- ggplot(stocks, aes(x = USStockUse, y = PriceRecievedFarmers)) + geom_point() + theme_bw() + xlim(0, .7)
ROW <- ggplot(stocks, aes(x = WorldStockUse, y = PriceRecievedFarmers)) + geom_point() + theme_bw() + xlim(0, .7)
grid.arrange(us, ROW, nrow = 1, top = "U.S. and World Stocks to Use vs Prices Recieved by U.S. Farmers, 1975-2017")
```

In the U.S. panel, a clear non-linear relationship if evident. The points from 2010-2012 had low stocks to use and exceptionally high prices. The points with the highest stocks to use and lowest prices are from the 1980’s when the U.S. government was in the business of holding stocks to support prices. Despite the non-linearity, there is clearly a relationship between stocks to use and prices for corn, as we would expect.

In a farmdoc Daily article Darrel Good and Scott Irwin propose,

\[Price = \alpha + \beta \frac{1}{Stocks \text{ } to \text{ } Use} + \epsilon,\]

as a model for ‘predicting’ the average price received by farmers variable. This certainly seems like a sensible choice, given that the shape of the relationship resembles \(y = \frac{1}{x}\), but in this example, we will see if we can find another functional form that performs better in terms of cross validation MSE. We will also consider if the log of the stocks to use variable might fit, since this could also produce close to the correct curved shape (with a negative coefficient of course). I should note that in the farmdoc daily article, Good and Irwin use a much shorter sample that possibly reflects more closely current supply and demand dynamics better. But, in our case, we need the sample size to do the more data intensive cross validation.

We will use a 5-Fold cross validation to assess the models. Since World Stocks to Use might give additional predictive power, we will work it into some of our specifications. We will consider each of the following model types, and we will try them with the polynomial terms with degree from 1 to 5:

Model | Functional Form |
---|---|

Model 1 | \(Price = \beta_0 + \sum_{i=1}^n \beta_i \Big(\frac{1}{US \text{ } Stocks \text{ } to \text{ } Use} \Big)^i + \epsilon\) |

Model 2 | \(Price = \beta_0 + \sum_{i=1}^n \beta_i \Big(\frac{1}{US \text{ } Stocks \text{ } to \text{ } Use} \Big)^i + \sum_{i=1}^n \gamma_i \Big(\frac{1}{World \text{ } Stocks \text{ } to \text{ } Use} \Big)^i + \epsilon\) |

Model 3 | \(Price = \beta_0 + \sum_{i=1}^n \beta_i \Big(log(US \text{ } Stocks \text{ } to \text{ } Use \Big)^i + \epsilon\) |

Model 4 | \(Price = \beta_0 + \sum_{i=1}^n \beta_i \Big(log(US \text{ } Stocks \text{ } to \text{ } Use \Big)^i + \sum_{i=1}^n \gamma_i \Big(log(World \text{ } Stocks \text{ } to \text{ } Use \Big)^i + \epsilon\) |

Model 5 | \(Price = \beta_0 + \sum_{i=1}^n \beta_i \Big(\frac{1}{US \text{ } Stocks \text{ } to \text{ } Use} \Big)^i + \sum_{i=1}^n \gamma_i \Big(log(World \text{ } Stocks \text{ } to \text{ } Use \Big)^i + \epsilon\) |

In the code below, each for loop is fitting one of the models 1-5 and looping over the polynomial degree. So in total, we have fit 25 different models. In addition to that the line with the `cv.glm()`

function is performing the k-Fold cross validation, i.e. generating 5 test and hold out samples, calculating the MSE of each and then averaging it. The `cv.XXXX`

vectors are storing the cross validation MSE to plot and display below.

```
set.seed(17)
cv.us=rep(0,5)
cv.World=rep(0,5)
cv.Lus=rep(0,5)
cv.LWorld=rep(0,5)
cv.World2=rep(0,5)
for (i in 1:5){
glm.fit=glm(PriceRecievedFarmers~poly(1/USStockUse, i) ,data=stocks)
cv.us[i]=cv.glm(stocks,glm.fit,K=5)$delta[1]
}
for (i in 1:5){
glm.fit.World=glm(PriceRecievedFarmers~poly(1/USStockUse, i) + poly(1/WorldStockUse, i) ,data=stocks)
cv.World[i]=cv.glm(stocks,glm.fit.World,K=5)$delta[1]
}
for (i in 1:5){
glm.fit=glm(PriceRecievedFarmers~poly(log(USStockUse), i) ,data=stocks)
cv.Lus[i]=cv.glm(stocks,glm.fit,K=5)$delta[1]
}
for (i in 1:5){
glm.fit.World=glm(PriceRecievedFarmers~poly(log(USStockUse), i) + poly(log(WorldStockUse), i),data=stocks)
cv.LWorld[i]=cv.glm(stocks,glm.fit.World,K=5)$delta[1]
}
for (i in 1:5){
glm.fit.World=glm(PriceRecievedFarmers~poly(1/USStockUse, i) + poly(log(WorldStockUse), i),data=stocks)
cv.World2[i]=cv.glm(stocks,glm.fit.World,K=5)$delta[1]
}
res_tibble <- cbind(cv.us, cv.World, cv.Lus, cv.LWorld, cv.World2) %>% as_tibble() %>% add_column(PolynomialDegree = seq(1:5)) %>% round(2)
colnames(res_tibble) <- c('Model 1', 'Model 2', 'Model 3', 'Model 4', 'Model 5', 'PolynomialDegree')
p1 <- ggplot(res_tibble, aes(x = PolynomialDegree, y = cv.us)) + geom_line() + ylim(0, 75) + labs(x = 'Model 1 Poly Degree', y = 'Cross Validation MSE') + theme_bw()
p2 <- ggplot(res_tibble, aes(x = PolynomialDegree, y = cv.World)) + geom_line() + ylim(0, 75) + labs(x = 'Model 2 Poly Degree', y = '') + theme_bw()
p3 <- ggplot(res_tibble, aes(x = PolynomialDegree, y = cv.Lus)) + geom_line() + ylim(0, 75) + labs(x = 'Model 3 Poly Degree', y = '') + theme_bw()
p4 <- ggplot(res_tibble, aes(x = PolynomialDegree, y = cv.LWorld)) + geom_line() + ylim(0, 75) + labs(x = 'Model 4 Poly Degree', y = '') + theme_bw()
p5 <- ggplot(res_tibble, aes(x = PolynomialDegree, y = cv.World2)) + geom_line() + ylim(0, 75) + labs(x = 'Model 5 Poly Degree', y = '') + theme_bw()
grid.arrange(p1, p2, p3, p4, p5, nrow = 1, top = "Cross Validation MSE for Competing Models")
```

The figure shows the cross validation MSE we get from each of the five models fit with polynomial degree 1:5. Remember, to get the cross validation MSE the `cv.glm()`

function with `k = 5`

is creating 5 different non-overlapping test and hold out samples. The MSE plotted in the figure is average of the MSE from each of the 5 different hold out samples.

In the table below we present the same information, the cross validation MSE by model type and for each polynomial degree. Since there were several models with very similar MSE we can see exactly which models are getting the minimum MSE.

Recall that Model 1 with a 1 degree polynomial is the model proposed by Good and Irwin in the farmdoc Daily article. This model does quite well, it comes in third, and the models that do better only do marginally better. The first place model is Model 5 with polynomial degree 1; the second place model is Model 4 with polynomial degree 2.

Now to get a sense of whether or not these models are providing economically different predictions, we show predictions from the top three models in the table below.

Rank | Specification | 2017 Price |
---|---|---|

1 | \(Price = \beta_0 + \beta_1 \frac{1}{US \text{ } Stocks \text{ } to \text{ } Use} + \gamma_1 log(World \text{ } Stocks \text{ } to \text{ } Use) + \epsilon\) | 3.06 |

2 | \(Price = \beta_0 + \sum_{i=1}^2 \beta_i \Big(log(US \text{ } Stocks \text{ } to \text{ } Use \Big)^i + \sum_{i=1}^2 \gamma_i \Big(log(World \text{ } Stocks \text{ } to \text{ } Use \Big)^i + \epsilon\) | 3.15 |

3 | \(Price = \beta_0 + \beta_1 \frac{1}{US \text{ } Stocks \text{ } to \text{ } Use} + \epsilon\) | 2.77 |

Currently, the March 2018 futures contract is trading at around $3.65, which is significantly higher than each of these predictions, but if you look back at the scatter-plot of the actual data, $3.65 is well within the typical variability for a stocks to use value of 0.16 which is our current expectation for the 2017/18 marketing year.

# That’s It!

That’s it for this time! I hope you enjoyed this explanation and toy example of cross validation. I certainly learned a lot writing it! I think you would only consider using cross validation for model selection if your data set was at least modestly bigger that the one in this post. However, this example goes to show that the technique can be useful even in medium data!

In the next post I will talk about Tree-Based Methods, Random Forests, Boosting, and Bagging from chapter 8 of ISL. I really currently know zero about these methods, but I have heard a lot of buzz about them. It is my sense that these are ‘real’ machine learning topics, whereas the cross validation and model selection stuff is more like necessary background.

Stay Tuned!

Great information and well articulated. Greatly appreciate this wonderful lesson. Waiting for the next post.

Thank you!

There are some tidy modeling packages that can get rid of some of those loops. Would you be interested in a modified version of the code that you’ve shown?

Certainly!