In Data Science for Non-STEM Majors, Is Learning-by-Watching Live Calculating Possible? Likely? Reasonable to Expect?
We commit as much intellectual malpractice when we let our students graduate from the university these days without command of the basic data science tools as any of the professors of the late...
We commit as much intellectual malpractice when we let our students graduate from the university these days without command of the basic Data Science tools as any of the professors of the late-mediæval university would have committed should they have allowed their students to graduate without command of a fine chancery hand…
However, whenever I have tried to act on this and tried to teach my students the basic data science tools as part of some exploratory data analysis modules of my courses, I have for the most part failed and have quickly retreated back to my standard pedagogy. The only cases in which it has worked have been those when I have been teaching Economics 101B – macroeconomics for stem majors. They know the tools, and so use of them for data analysis, and also for model simulation serves as a reminder of what they learned, a skill-booster for what they ought to know, and an aid to their comprehension of the meat of the class. In my other classes, however, the bimodal distribution of students’ previous experience with data science means that my attempt to demonstrate use of the tools is either horrifyingly opaque and bewildering or too boring in elementary, depending on which of the barbells in the distribution a particular student falls into. So, then, why am I trying this again? Because we need to find a way to do this if we are to do our jobs properly. And because I am a glutton for punishment…
2025-01-28 :: American Economic History :: Very Long Run Growth
J. Bradford DeLong
<https://datahub.berkeley.edu/hub/user-redirect/lab/tree/2025-01-28-econ-113/2025-01-28-econ-113-very-long-run-growth.ipynb>
This is the very first Python Jupyter notebook I am creating for the spring 2025 instantiation of UC Berkeley Econ 113. Its purpose to illustrate how one can use Python Jupyter Notebooks to do calculations and manipulate data, in such a way that afterwards you can figure out what you have done.
The right approach to take to this task is to think always that you are writing for the greatest idiot imaginable—for there is nobody so idiotic as yourself a year from now, trying to figure out why past-you wrote down all of the numbers that you did back then.
Human Population (in millions)
We guess that, for 95% of us alive today, more than 90% of our genes come from about 100 groups of 100 Large East-African Plains Apes wandering around near the Horn of Africa some 75000 years ago. Those 10,000 are our ancestors.
There were, back, then lots of other groups of Large East-African Plains Apes back then—a total worldwide population of perhaps 1 million? And we can see their existence in a (small component of) our genes, as we spread out across the world and "interacted" and then replaced them. But it seems not unreasonable to take those 10,000 as us, for either they were phenomenally lucky or they behaved significantly differently from other Large East-African Plains Apes in the process that made them the overwhelming sources of our genome, and not other groups.
(Parenthetically, that means that we are all very close cousins—less genetic diversity in the entire human race than in your standard baboon troop of 100. The "selfish gene" point of view says that sexually reproducing animals tend to evolve group solidarity: that you ought to be willing, from your genes' point of view in the sense that those are the genes that will tend to spread, to lay down your life so that more than 2 siblings or more than 4 first cousins can live. But our roots in those long-ago 10,000 mean that we are so inbred that, effectively, our genes "want" us to be massively other-regarding and self-sacrificing, and to act on the principle that: "the needs of the many outweigh the needs of the few":

2025-01-28 :: American Economic History :: Very Long Run Growth :: J. Bradford DeLong :: <https://datahub.berkeley.edu/hub/user-redirect/lab/tree/2025-01-28-econ-113/2025-01-28-econ-113-very-long-run-growth.ipynb>
MOAR observations below the fold:
The basic idea is to start with my (unmotivated in this lecture) long-run historical guesstimates of human population and human average real income, and then perform calculations with a very visible and clear audit trail to get to these two tables—the first of the past 75000 years’ worth of the levels of the human population, of average human real prosperity, and “technology”—the deployed capability of humans to manipulate nature and work together productively and coöperatively—and the second of the proportional growth rates of numbers, average income, and deployed technology:
This is the shape of the human economy at the most eagle-eye view possible over the past 75000 years. We cannot talk about American economic history without this as background, so we can assess both how America participates and contributes to this average world historical pattern, and how it diverges from it.
If we get to these tables on time, we will then have a chance to discuss:
How we see no improvement in average material standards of living between year -73000 and 1870, with pronounced impoverishment during the long Agrarian Age between the Neolithic Revolution invention of agriculture and the 1870 start of the Second Industrial Revolution…
How the numbers of “us” have grown a millionfold over 75000 years—with the first thousandfold multiplication taking 69000 and the second taking 6000 years, and with the last three tenfold multiplications taking 3500, 2300, and then 200 years, respectively…
The geometric midpoint of the ten-thousandfold amplification of technology coming in 1900: as much proportional (and twelve times more absolute) growth in “technology” in the past 100 years as in the previous 74900…
That it was America that was the Prime Mover driving this forward march of technology since 1870 (with relatively small assists after 1870 from Germany, Switzerland, Japan, and most recently the Pearl River Delta)…
But I really, really do not think that we will get there.
What I do hope we will get to discuss as I run through entering symbols into code cells in the Python Jupyter notebook, running them, debugging, and looking at output are:
Stuffing numbers (and strings of symbols) into conceptual boxes, each labeled with a name, and then linking the boxes together and defining tools to manipulate and display the numbers and symbols in them…
A pseudo-quasi English-languagelike syntax that is and has to feel arcane and opaque…
The inability of the computer to accept the command “do what I mean”…
The brand new ability of ChatGPT and its ilk to guess correctly at “this is probably what you mean to do, and here is how to do it”…
How data cleaning and data manipulation is the most important part of being any kind of a quant, or even of being successfully quant-adjacent: Most worthwhile quantitative work in the real world involves data cleaning. It is boring. It is remarkably difficult. It is immensely time consuming. It is absolutely essential…
Use Excel, and you wind up in trouble. Cf.: Reinhart-Rogoff…
And, most important: Expect bewilderment! Things go wrng!!
If you do not often feel like a Sorcerer's Apprentice having a bad day, you are doing computer programming and data science wrong. This
feeling is remarkably common among programmers. It is explicitly referenced in the introduction to the classic computer science textbook, Abelson, Sussman, & Sussman: Structure and Interpretation of Computer Programs:
In effect, we conjure the spirits of the computer with our spells. A computational process is indeed much like a sorcerer’s idea of a spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform intellectual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer’s spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform. A computational process, in a correctly working computer, executes programs precisely and accurately. Thus, like the sorcerer’s apprentice, novice programmers must learn to understand and to anticipate the consequences of their conjuring.... Master software engineers have the ability to organize programs so that they can be reasonably sure that the resulting processes will perform the tasks intended...
Master software engineers can be reasonably sure that processes will perform the tasks intended. Master. Reasonably. Intended.
I taught economics for 34 years...I'm not sure it matters what the field is, you have to experiment. Many fail, but at some point, one or more will work as hoped. I don't know how else you integrate new subject areas, especially where there is not a lot (or even any) experience.
"Most worthwhile quantitative work in the real world involves data cleaning. It is boring. It is remarkably difficult. It is immensely time consuming." Sounds like the perfect future job for AI.