Photo by Dirk Vorderstraße (Mähdrescher von John Deere in der Warburger Börde) [CC BY 2.0], via Wikimedia Commons
Post 1 in a series of 3.
Post 2 : Machine Learning and Econometrics: Model Selection and Assessment Statistical Learning Style
Post 3: Machine Learning and Econometrics: Trees, Random Forests, and Boosting
Machine learning and artificial intelligence are the biggest topics in tech right now, and the excitement is spilling over to economics. The 2018 ASSA meetings (I did not attend, just browsed the program) had at least five sessions devoted to the topic of machine learning. Susan Athey, and Guido Imbens held a wildly popular continuing education workshop at the meetings that was later released as a webcast. At the 2017 ASSA meetings, which I did attend, there was only one or two sessions devoted to the topic of machine learning and AI in economics, but the sessions were standing room only.
I have to confess I know next to nothing about machine learning and artificial intelligence. I know that it is vaguely connected to the idea of ‘Big Data’, because of the possibility fitting (or over fitting) flexible functional forms. What I don’t know is how big does your data need to be? A lot of empirical work in agricultural economics falls decidedly in the realm of medium data. Our sample sizes are respectable, but not Amazon user clicks big.
One area I am sure it can fit well is Precision Agriculture. If you are a big ag business company, like John Deere for example, and you have data streaming in from client equipment that gives you meter-by-meter measures of field characteristics on millions of acres, I’m sure machine learning tools will be appropriate. For the rest of us I’m not sure, but I want to find out.
The Point of this Post
I conviced Todd Kuethe and Todd Hubbs to try to write a paper with me predicting commercial bank delinquency rates uring neural networks. We’re supposed to present this reasearch at the SCC-76 meetings in Kansas City on April 5, and none of us have any experience (at least I don’t) with neural networds or machine learning, but this is good motivation to figure it out!
Ag econ is great in that most of our conferences require only a proposal. Some fields make you submit a whole paper just to propose to a conference. I am sure I would never learn anything new if I had to write a whole paper before I committed to stand in front of people and talk about it. I’ve been meaning to dig into machine learning and see what it is all about for a couple of years. Now, with the committment mechanism of having to produce a paper for the SCC-76 meetings, I will at least get through the basics.
The reason I’m writing a series of blog posts about this is I think a lot of ag economists are in the same boat as me with regards to machine learning, so I will share the big picture of what I learn here. I’m going to start by going through An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie, and Tibshirani. A PDF of the book is available on the website, as is all the code and data. It is also available for purchase on Amazon.
An Introduction to Statistical Learning
In this series of posts I’m not going to go through the code examples, those are available on the book’s website. Instead, I’ll be working through in my own mind how the approaches of machine learning and traditional econometrics are similar and different, and how ‘big’ does your data need to be to apply machine learning with some success? As I gain insight into these questions, I’ll share them here. If you want a primer on machine learning versus traditional econometrics, this talk Susan Athey gave is a pretty good primer.
Looking forward to SCC-76 and your paper! I think the are are lots of exciting potential uses — we have an R&R for a paper that uses machine learning with ARMS data. dyson.cornell.edu/faculty-resear… @KevinPatrick_
Cool! The SCC paper could totally bomb and then we’ll have to come up with something else!
Hi Mindy,
Looking forward to seeing your work on commercial delinquency on your blog. BTW, have you tried Orange https://orange.biolab.si/. It is an open source data visualization software I got to know a few weeks ago. It might be helpful in comparing ML models.
Best Regards,
Atul
I love the idea of “live blogging” the writing of a paper.
You are onto something both relevant and important. That said, @TKuethe and Hubbs might slow you down. 😊
We’re there to make sure other people can catch up!
Looking forward to this. I’ve wrestled with these topics a while: http://econometricsense.blogspot.com/2015/09/econometrics-math-and-machine.html but have not had a chance to keep up with all of Athey’s work. I’ve got to find time for the videos you mentioned. I’d also recommend the O’Reilly text Data Science For Business for those interested in general data science but less technical than the Elements if Statistical Learning. Although that is a great book!