Time to actually try to create for myself an informed view on what the likely econo-societal effects of MAMLM—Modern Advanced Machine-Learning Models—will be. Wish me luck!…
What should we call this technology? IMO, we should use easy to recall names, not acrobyms. Just as rule based models with coded knowledge were called "expert systems", or rules created using data "decision trees", I would use the term "language systems" or "language Model".
IMO, these language models are replicating Kahneman's "thinking fast" (System 1). System 1 allows fluid verbiage that flows without thinking. For example, when I used to mimic Robin Leech's verbiage on the lifestyles of the rich, I had no idea what I was going to say next, I just let the words flow. In a similar vein, for those with dual or more languages, stress may revert the structure of spoken English (2nd language) to that of the native language, that may be more deeply embedded in the cortex.
Idk about "laws of Thought", but I have been playing with testing Chomsky's innate grammar with ChatGPT (3.5) and I think it may invalidate it. Our language acquisition may be purely mimicry, ie learning by example, just like langyage models.
Will language models change society? Historically, we built systems and machines to carefully reproduce a consistent result. From draught animales turning a wheel, to powered machinery weaving cloth, to factory systems turning out exact replicas of objects. These muscle substitutes hugely enhanced our societal productivity. Computers when used in association with these tools, like controlling robots similarly do so. But computers used a "bicycles for the mind" don't just speed teh journey from A to B, but allow exploration and taling mental journeys far further afield. This is why computer software like spreadsheets, word processors, and so forth, do not reduce employment, but expand it. It is a mental equivalent of Northcote's Law - mental work will expand given the allowed time to complete it.
Language models have, however, a flaw, that you demonstrate at teh outset. They are not like software to control machines to replicate outputs, but instead, just like probabilistiv Markov models. The output will vary with small changes in input (prompts). This suggests to me that unless this can be fixed, language models are best used where accuracy is not needed. For example, software algorithms must be accurate and not break, and this is tested with QA methods, especially to detect "corner cases". (Expert Systems and Decsion Trees are "brittle" when unexpected data "broke" the rule structure".) So language models work well to create drafts of text, images based on similar approaches, and other creative applications where accuracy is not important. Where they fail is where accuracy is needed, such as your citation builder example.
Just as software engineers end up in cycles of reiterating code to meet the non-expert verbal instructions - "That is not exactly what I meant. can you do [X]", language models need to be able to handle recursion of prompt instructions. But just as the complexity often results in "I could do this faster than repeated instructions to subordinates", so I think language models will not be productivity enhancing without help. Can they be induced to attempt Kahneman's "thinking slow"? Idk. What I do think is possible is that they be integrated with existing software that does do the task effectively. In your example, there are many citation builders available. Some can guess at teh correct citation and output directly from the database of content using just the title. If that fails, then inputting details into the input fields solves the problem. Integrating a language model with an existing citation builder would likely produce accurate output and would be productivity enhancing. If you need real math done, integrate with Mathematica to build the model and test inputs. Once a correct version is built, "fix" that model for future requests, rather than teh language model building it from scratch each session. Similarly, using texts, go through teh selection process of texts to use with teh language model, and fix those texts in a database, so that requests of information from texts always uses those texts alone. It should not try to build a language model on those texts, but extract exact pieces from those texts to build a precis, or build an argument, "for and against" an assertion to be tested. Just as computers and software do not reinvent basic operations on numbers, so language models should operate using existing algorithms where accurate results are needed, and used their creative, sloppy, responses to convert the output to human language if needed.
Also: Have you heard of Williams syndrome? There are physical deformities, but also facility with complex language and the generation of nonsense. Some scholars wonder if the stereotype of the prattling court jester was based on the disorder.
Brad, have you read Bob Wright's post that included a study of LLM-based chatbot-human compared to human-human debates? How does this psychological interaction figure into your thinking? From the blog:
"In the bare bones version of the experiment, the bots were only slightly better than humans at moving their debate opponents toward their position; the difference wasn’t big enough to pass the test for statistical significance. But when bots were given demographic data about their debate opponents, they got much better at changing minds. Humans, given the same data about their opponents, got slightly worse at persuasion. The authors write, “Not only are LLMs able to effectively exploit personal information to tailor their arguments, but they succeed in doing so far more effectively than humans.”
Does it replicate? P-hacking is a thing. "Clever Hans" is a thing. Except for code copilots, it takes a lot of bespoke human skull sweat to get things truly useful out of these...
?? I don't understand your reply. As I read it, it's not that the bots did this on purpose. It's that, given this info about the bots, feeding them the appropriate data can be used by humans to help change people's minds.
Kahneman: "“Flawed stories of the past shape our views of the world and our expectations for the future. Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen. Any recent salient event is a candidate to become the kernel of a causal narrative.”
Every pundit has used this approach to make him/her sound smart. And of course, no one wants to admit they were wrong before, so forget the last "analysis" and just listen to the current one.
And let us not forget the proven aproaches of explanation since time immemorial:
"It is fate" or "It is God's Will".
I wish I could have an AI substitute those words and put them in teh mouth of every hack pundit who tries to explain the immediate past and how it will play out going forward.
Aren’t you (going to) rely too heavily on the ‘there is no way to predict the future impact of technology’ assumption you and Noah Smith argue for? We know the heat death of the universe from first principles even though we dont know how entropy distribution locally. So assuming over time AI exceeds all human reasoning, and can therefore subsititute all humans, is an interesting and relevant edge case to contemplate. No?
It's interesting you cite the Wolfram post as Alessandrini, Klee, and Wolfram -- crediting as authors the people that Wolfram acknowledges as helping with the piece -- but the "cite as" on the post reads: "Stephen Wolfram (2023), "What Is ChatGPT Doing ... and Why Does It Work?," Stephen Wolfram Writings. writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work." And the near-identical book is credited to Wolfram alone.
Of course there have been controversies and lawsuits, one of them outlined in detail by Cosma Shalizi a while back, about Wolfram claiming authorship (on top of less-disputed ownership) of employees' work. But what's going on in this case? From a quick search it seems that no one other than Shalizi (who says he was prompted by "A.G." [= his colleague Albert Gu?] "to list the authors properly", refers to the piece in this fashion (in the bacra.org post that you list above).
Your comments are excellent framing and I think worth composing for a wider audience. In particular:
1) MAMLM's (or whatever) are powerful, but not intelligent. Don't call it AI, to prevent nonsense and propaganda being sold as insightful and authoritative, conversely the output shouldn't be ignored.
2) Rapid change poses risk which authoritarians will use to gain power. To preserve democracy, we need to manage the burden of change rather than prevent change, particularly through focused public investment.
3) New technology is often both labor substitution and labor complimentary, though we are awful at forecasting the labor complimenting.
A good question is whether these language systems are more like an FFT or an SVD. You can learn a lot about a system from an FFT because the cycles in the frequency domain are often solutions to differential equations that describe the generating system. If you use an FFT to study tides, you'll quickly recognize the influence of the moon and the sun and various properties of their orbits. You can learn a lot less from an SVD. You can use an SVD to create a simpler model of a system, extract a subproblem or eliminate a variety of artifacts, but the basis vectors are what they are. Sometimes, you can learn things about the system , but for more complex systems too many things get aggregated to be studied usefully.
If you look at the basic N-space word vectors that let language systems solve word analogy problems, the dimensions really don't correspond to anything semantically useful. It's a lot like the multidimensional encoding in an old fashioned modem, more about geometric constraints than informational. In a Huffman encoding, the compressed sequence tells you something about the frequency tree and common patterns in the data, but in an N-space modem encoding it's just about cramming as many spheres as possible into a higher order space.
Artificial intelligence has spun off many fields of application: machine vision, text recognition, expert systems, robotic control and the like. In fact, the way one can tell that an AI driven solution was successful was that no one really thinks of it as AI anymore. Right now, these language systems are fascinating, and we may be able to learn a lot from them. Unfortunately, there aren't any applications solid enough for them to be considered solutions. We'll know these language systems have arrived when the "AI" hype vanishes and people and pundits just talk about conversational interaction systems (CIS) or document summarization utilities (DSUs) or whatever.
Here is what occurs to me. In my working day, I write down models of stochastic processes that are supposed to represent changes in the inputs to the valuation models for financial derivatives. Things like interest rates, spot prices for cash-and-carry assets, forward price curves for commodities like gas or power, inflation rates, implied volatilities for all these things, etc etc. We pick models for the tractability of their solutions, for the stability of their calibrations, and of course for their ability to match market prices within arbitrage constraints. We start with the simplest model that will do the job; the ideal number of free parameters is one. When that fails we start adding bells and whistles; a term-structure of drift, a term-structure of volatility, stochastic volatility, jumps in the process etc etc. But no matter how many meta-levels we lard in, we never think that we will actually capture the complete dynamics of processes that are ultimately determined not just by underlying physical processes but by the psychological interactions of millions of market participants. This was basically Keynes' objection to the idea that at some level, model parameters would become stable.
But if the grandiose claims that the current line of "AI" development can lead to *general* artificial intelligence are true, then we are wrong. If it is possible to exactly model the entirety of human action by statistical methods, then it follows that we *can* exactly capture the dynamics of our price factor processes also by statistical methods. There *would be* some meta-level at which stable parameters would be achieved.
Now, who really believes this when you put it like that?
There is a big difference from "pyschohistory" that you are effectively alluding to. and AGI which is very different. AGI is human-level intelligence, and since you cannot perfectly predict the future of financial instruments, neither can an AGI.
Some very interesting observations.
What should we call this technology? IMO, we should use easy to recall names, not acrobyms. Just as rule based models with coded knowledge were called "expert systems", or rules created using data "decision trees", I would use the term "language systems" or "language Model".
IMO, these language models are replicating Kahneman's "thinking fast" (System 1). System 1 allows fluid verbiage that flows without thinking. For example, when I used to mimic Robin Leech's verbiage on the lifestyles of the rich, I had no idea what I was going to say next, I just let the words flow. In a similar vein, for those with dual or more languages, stress may revert the structure of spoken English (2nd language) to that of the native language, that may be more deeply embedded in the cortex.
Idk about "laws of Thought", but I have been playing with testing Chomsky's innate grammar with ChatGPT (3.5) and I think it may invalidate it. Our language acquisition may be purely mimicry, ie learning by example, just like langyage models.
Will language models change society? Historically, we built systems and machines to carefully reproduce a consistent result. From draught animales turning a wheel, to powered machinery weaving cloth, to factory systems turning out exact replicas of objects. These muscle substitutes hugely enhanced our societal productivity. Computers when used in association with these tools, like controlling robots similarly do so. But computers used a "bicycles for the mind" don't just speed teh journey from A to B, but allow exploration and taling mental journeys far further afield. This is why computer software like spreadsheets, word processors, and so forth, do not reduce employment, but expand it. It is a mental equivalent of Northcote's Law - mental work will expand given the allowed time to complete it.
Language models have, however, a flaw, that you demonstrate at teh outset. They are not like software to control machines to replicate outputs, but instead, just like probabilistiv Markov models. The output will vary with small changes in input (prompts). This suggests to me that unless this can be fixed, language models are best used where accuracy is not needed. For example, software algorithms must be accurate and not break, and this is tested with QA methods, especially to detect "corner cases". (Expert Systems and Decsion Trees are "brittle" when unexpected data "broke" the rule structure".) So language models work well to create drafts of text, images based on similar approaches, and other creative applications where accuracy is not important. Where they fail is where accuracy is needed, such as your citation builder example.
Just as software engineers end up in cycles of reiterating code to meet the non-expert verbal instructions - "That is not exactly what I meant. can you do [X]", language models need to be able to handle recursion of prompt instructions. But just as the complexity often results in "I could do this faster than repeated instructions to subordinates", so I think language models will not be productivity enhancing without help. Can they be induced to attempt Kahneman's "thinking slow"? Idk. What I do think is possible is that they be integrated with existing software that does do the task effectively. In your example, there are many citation builders available. Some can guess at teh correct citation and output directly from the database of content using just the title. If that fails, then inputting details into the input fields solves the problem. Integrating a language model with an existing citation builder would likely produce accurate output and would be productivity enhancing. If you need real math done, integrate with Mathematica to build the model and test inputs. Once a correct version is built, "fix" that model for future requests, rather than teh language model building it from scratch each session. Similarly, using texts, go through teh selection process of texts to use with teh language model, and fix those texts in a database, so that requests of information from texts always uses those texts alone. It should not try to build a language model on those texts, but extract exact pieces from those texts to build a precis, or build an argument, "for and against" an assertion to be tested. Just as computers and software do not reinvent basic operations on numbers, so language models should operate using existing algorithms where accurate results are needed, and used their creative, sloppy, responses to convert the output to human language if needed.
You make a lot of good points. Thanks.
Also: Have you heard of Williams syndrome? There are physical deformities, but also facility with complex language and the generation of nonsense. Some scholars wonder if the stereotype of the prattling court jester was based on the disorder.
"ChatGPT4 “identifies” and slots lastname, firstname, title, website, and URL into the format. It does all of these correctly."
Its subtitle is "Information Technology and the Future of Society." Your original subtitle is "Information Technology in the Service of Society"
Touché...
Brad, have you read Bob Wright's post that included a study of LLM-based chatbot-human compared to human-human debates? How does this psychological interaction figure into your thinking? From the blog:
"In the bare bones version of the experiment, the bots were only slightly better than humans at moving their debate opponents toward their position; the difference wasn’t big enough to pass the test for statistical significance. But when bots were given demographic data about their debate opponents, they got much better at changing minds. Humans, given the same data about their opponents, got slightly worse at persuasion. The authors write, “Not only are LLMs able to effectively exploit personal information to tailor their arguments, but they succeed in doing so far more effectively than humans.”
https://open.substack.com/pub/nonzero/p/the-new-persuaders?r=66ut&utm_campaign=post&utm_medium=email
Does it replicate? P-hacking is a thing. "Clever Hans" is a thing. Except for code copilots, it takes a lot of bespoke human skull sweat to get things truly useful out of these...
?? I don't understand your reply. As I read it, it's not that the bots did this on purpose. It's that, given this info about the bots, feeding them the appropriate data can be used by humans to help change people's minds.
Kahneman: "“Flawed stories of the past shape our views of the world and our expectations for the future. Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen. Any recent salient event is a candidate to become the kernel of a causal narrative.”
Every pundit has used this approach to make him/her sound smart. And of course, no one wants to admit they were wrong before, so forget the last "analysis" and just listen to the current one.
And let us not forget the proven aproaches of explanation since time immemorial:
"It is fate" or "It is God's Will".
I wish I could have an AI substitute those words and put them in teh mouth of every hack pundit who tries to explain the immediate past and how it will play out going forward.
Aren’t you (going to) rely too heavily on the ‘there is no way to predict the future impact of technology’ assumption you and Noah Smith argue for? We know the heat death of the universe from first principles even though we dont know how entropy distribution locally. So assuming over time AI exceeds all human reasoning, and can therefore subsititute all humans, is an interesting and relevant edge case to contemplate. No?
It's interesting you cite the Wolfram post as Alessandrini, Klee, and Wolfram -- crediting as authors the people that Wolfram acknowledges as helping with the piece -- but the "cite as" on the post reads: "Stephen Wolfram (2023), "What Is ChatGPT Doing ... and Why Does It Work?," Stephen Wolfram Writings. writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work." And the near-identical book is credited to Wolfram alone.
Of course there have been controversies and lawsuits, one of them outlined in detail by Cosma Shalizi a while back, about Wolfram claiming authorship (on top of less-disputed ownership) of employees' work. But what's going on in this case? From a quick search it seems that no one other than Shalizi (who says he was prompted by "A.G." [= his colleague Albert Gu?] "to list the authors properly", refers to the piece in this fashion (in the bacra.org post that you list above).
Your comments are excellent framing and I think worth composing for a wider audience. In particular:
1) MAMLM's (or whatever) are powerful, but not intelligent. Don't call it AI, to prevent nonsense and propaganda being sold as insightful and authoritative, conversely the output shouldn't be ignored.
2) Rapid change poses risk which authoritarians will use to gain power. To preserve democracy, we need to manage the burden of change rather than prevent change, particularly through focused public investment.
3) New technology is often both labor substitution and labor complimentary, though we are awful at forecasting the labor complimenting.
A good question is whether these language systems are more like an FFT or an SVD. You can learn a lot about a system from an FFT because the cycles in the frequency domain are often solutions to differential equations that describe the generating system. If you use an FFT to study tides, you'll quickly recognize the influence of the moon and the sun and various properties of their orbits. You can learn a lot less from an SVD. You can use an SVD to create a simpler model of a system, extract a subproblem or eliminate a variety of artifacts, but the basis vectors are what they are. Sometimes, you can learn things about the system , but for more complex systems too many things get aggregated to be studied usefully.
If you look at the basic N-space word vectors that let language systems solve word analogy problems, the dimensions really don't correspond to anything semantically useful. It's a lot like the multidimensional encoding in an old fashioned modem, more about geometric constraints than informational. In a Huffman encoding, the compressed sequence tells you something about the frequency tree and common patterns in the data, but in an N-space modem encoding it's just about cramming as many spheres as possible into a higher order space.
Artificial intelligence has spun off many fields of application: machine vision, text recognition, expert systems, robotic control and the like. In fact, the way one can tell that an AI driven solution was successful was that no one really thinks of it as AI anymore. Right now, these language systems are fascinating, and we may be able to learn a lot from them. Unfortunately, there aren't any applications solid enough for them to be considered solutions. We'll know these language systems have arrived when the "AI" hype vanishes and people and pundits just talk about conversational interaction systems (CIS) or document summarization utilities (DSUs) or whatever.
An AI researcher I respect was blown away by this AI https://dev.hume.ai.
You might want to get access and see if you can customize a BradBot using it.
Here is what occurs to me. In my working day, I write down models of stochastic processes that are supposed to represent changes in the inputs to the valuation models for financial derivatives. Things like interest rates, spot prices for cash-and-carry assets, forward price curves for commodities like gas or power, inflation rates, implied volatilities for all these things, etc etc. We pick models for the tractability of their solutions, for the stability of their calibrations, and of course for their ability to match market prices within arbitrage constraints. We start with the simplest model that will do the job; the ideal number of free parameters is one. When that fails we start adding bells and whistles; a term-structure of drift, a term-structure of volatility, stochastic volatility, jumps in the process etc etc. But no matter how many meta-levels we lard in, we never think that we will actually capture the complete dynamics of processes that are ultimately determined not just by underlying physical processes but by the psychological interactions of millions of market participants. This was basically Keynes' objection to the idea that at some level, model parameters would become stable.
But if the grandiose claims that the current line of "AI" development can lead to *general* artificial intelligence are true, then we are wrong. If it is possible to exactly model the entirety of human action by statistical methods, then it follows that we *can* exactly capture the dynamics of our price factor processes also by statistical methods. There *would be* some meta-level at which stable parameters would be achieved.
Now, who really believes this when you put it like that?
There is a big difference from "pyschohistory" that you are effectively alluding to. and AGI which is very different. AGI is human-level intelligence, and since you cannot perfectly predict the future of financial instruments, neither can an AGI.
We aren't trying to predict the future.
It sounds like you are modeling the instruments, which should, if accurate, provide a window on the future, however short.
:-)
:-)