7 Comments

I taught economics for 34 years...I'm not sure it matters what the field is, you have to experiment. Many fail, but at some point, one or more will work as hoped. I don't know how else you integrate new subject areas, especially where there is not a lot (or even any) experience.

Expand full comment
Jan 29Edited

"Most worthwhile quantitative work in the real world involves data cleaning. It is boring. It is remarkably difficult. It is immensely time consuming." Sounds like the perfect future job for AI.

Expand full comment

Well. However good a job you are doing on data science, you are misinforming your students on one substantive point: the effective population size of the 8 billion humans living today is 10-20 thousand. It is not the case that the actual population of humanity 75 kya was about 10,000.

Let's give the mic to John Hawks: "It is our conclusion that, at the moment, genetic data cannot disprove a simple model of exponential population growth following a bottleneck 2 MYA at the origin of our lineage and extending through the Pleistocene. Archaeological and paleontological data indicate that this model is too oversimplified to be an accurate reflection of detailed population history, and therefore we find that genetic data lack the resolution to validly reflect many details of Pleistocene human population change. However, there is one detail that these data are sufficient to address. Both genetic and anthropological data are incompatible with the hypothesis of a recent population size bottleneck."

https://academic.oup.com/mbe/article/17/1/2/975516

Expand full comment

Lots of students graduate from the university these days without command of the basic Data Science tools. If you major in English Literature, History, Education, French, etc. there is absolutely no requirement to understand basic data science tools. I think you are conflating a PRAGMATIC university degree with university degrees in general.

Expand full comment

The data cleaning/gathering part is a given, but if I had to choose *one* thing it'd rather be, to lift one of your recent metaphors, that while you're typing on a Jupyter notebook you're auditing the books of a factory, not looking inside while people build things.

Data is just numbers recorded by some combinations of machines and people, meaningless without understanding what those machines and people were doing --- the *name* of a variable is not a guarantee of what relationship that variable has with anything in world --- models and coefficients are fingers pointing to small, hopefully useful drawings of bits of the Moon, not the Moon itself --- etc.

And "data science models" are even flimsier than that...

"Clean your data" is almost a corollary of it, as long as you don't grant too much totemic power to the cleaned up data. And gods save us from economists reifying the coefficients of the first model they got to fit well enough to publish into real aspects of the world.

FWIW, this isn't an entirely negative plea. The more I've been able to internalize this the better I've been at doing data analysis, or at least at getting useful results from it, and I think Jayne's view of Bayesian statistics as the only continuous extension of classical logic is still criminally underrated even as Bayesian methods "won." Navigating the back-and-forth between doing things with symbols and numbers and saying things about the world is at least slightly less hard if we keep the distinction in mind.

Also, also, I realize it might be something best taught after students have had some practice to ground it, not at the outset when it seems esoteric.

Expand full comment

I have taught residential real estate appraisers how to use Excel for basic data analysis for 15 years. I've found the best way to teach is in small teams grouped by ability to use excel with helpers able to assist. I have no idea how practical that is for you but the key for me is to separate out those totally clueless from the rest to keep the class moving forward. I'm moving to RStudio in the next couple of weeks for my own work.

Expand full comment

Please over-emphasize the data cleaning part (and data gathering, of course). It may not matter as much academically, for someone will address data matters sooner rather than later by publishing a corrected result. In non-academic spheres, with bad data you lose money (and lives), most of the time more money than you can ever repay from your relatively meagre salary. Lives of course you cannot do anything about.

Expand full comment