2023-04-21 Fr: continuing to try to train a ChatBot so that it can give talks on and answer questions about Slouching Towards Utopia <bit.ly/3pP3Krk>...
So my original academic background is in natural language processing. I wrote some very primitive statistical language models, and I worked through much of the '00s at Motorola, on predictive text entry software. (All of my Motorola patents got sold to Google at some point, so whenever you say "damn you, autocorrect!" to an Android -- sorry!) I studied at Johns Hopkins in the back half of the '90s, under Fred Jelinek, who's widely regarded as the originator of the revolution in speech recognition and text processing, replacing attempts to code formal grammars with abstract statistical models.
So I have some rudimentary understanding of what these modern much-fancier statistical chat-bots are doing, and I feel fairly confident in saying that in terms of reaching "artificial general intelligence", they are a dead end.
Boston Dynamics' terrain navigating robots are a much more promising avenue of research. AGI, if it ever emerges, is going to need to have some kind of grounding in reality. It needs first a model of the world, and a model of itself in relation to the world. Building from that, you'd want to gradually generalize its model of itself, to recognize that some of what's in the world is other agents like itself -- so then you get theory of mind. (In this area, AIs that control avatars in games -- where they deal with a virtual world, and other agents, and try to achieve goals -- are potentially a relevant area of research. We've already seen some transfer of code for efficient control loops from gaming, to controls for drones, self-driving vehicles, etc.) Even the most primitive animal has something like this -- prey animals are constantly trying to detect predators in the environment and predict their attack vectors, in order to escape.
Once you have an AI that is reality-grounded, _then_ you can start to layer on language, teaching it how to explain itself, and understand the intentions and requests of other agents. (Though of course this raises all kinds of interesting questions around the alignment problem. These explanations could be false both for reasons of willful deception -- the AI becoming trained to tell you what you _want_ to hear, regardless of the truth -- and just because explaining motivation is hard. _Humans_ are quite bad at understanding their own motivations, let alone interpreting other people's.)
Without attachment to reality, a language model has no way of understanding what is true or useful. As you say, it lacks any kind of grounding in a theoretical framework. It's just producing words that sound vaguely similar to its source material. It's Frankfurtian bullshit.
The hope is that you can, via proper prompting, send it to a corner of the training data space that say true things about the subject matter of the question asked...
Sure, but as long as there is nothing in the model itself that represents the difference between "true" and "truthy", you're always stuck with that kind of prompt engineering. It's not artificial intelligence, it's artificial smart-sounding-ness.
I rather like the joke that it's "mansplaining as a service".
In somewhat old-fashioned terms, bare syntax vs. syntax+semantics.
A point which at some level seems to have been fairly well understood about 50 years ago (in the computer community, specifically, much longer elsewhere).
I don't follow what's being done but I assume something is still being done on the alternate pathway. Though if you want semantics you can't just throw a large dataset at it and hope that something will emerge (doesn't work too well for people either, past the language acquisition/visual analysis phase).
We do know how to do semantics, theoretically.
On the other hand, I've seen students over their heads in class because they're stuck in a mode of trying to answer questions without building a semantics for the subject matter. Which can work for a very long time in a traditional academic setting; but they do hit a dead end.
What I think is that the ChatBot does not have a theory of mind for you, it has only a theory of text. So if the response to the prompt "Why was humanity so poor back in the long Agrarian Age?" does not include the word "Malthusian", that is because whenever the text it was trained on - presumably Slouching Towards Utopia? - says that humanity was poor back in the long Agrarian Age, it does not usually mention that this was a Malthusian condition. When you wrote the text, you were thinking "Malthusian", because you have a theory about the Agrarian Age; and when I read that text, I was thinking "Brad is thinking Malthusian", because I have a theory of mind for Brad DeLong. But that's the difference between ChatBot and me.
Put in the terms you use below, the ChatBot can go to a corner of its training space, and its space is a very very high dimensional hypercube not legible to the human mind, so this allows it to surprise us; but it cannot create a corner of space where it was never trained.
My knowledge of this AI is from cursory reading of the technology. I have a general idea of the strategy and goals of the transformation, though I have no idea how correct I am. But it has lead to some ideas. I wonder what would happen if the hidden database were again sent through the transformer, thus bringing long distance terms even closer together? The idea would be similar to calculus in making derivatives. With an equation representing distance and time, the first derivative yields velocity (d/v). The second derivative yields acceleration (d/v^2). With each calculation, a term is lost. Likewise when making an integration, a constant will need to be added back.
So with this model starting with a paragraph, you have a thought. Transform it again yields a concept. Transform again into a theme. Reverse the process to get an output. This process may yield different types of output across similar disciplines. Of course this may be just my imagination running wild.
I spent some time thinking on this for a bit. I suppose if doing transformations on the meta- and meta-meta-data have been done, then it may be a dead end.
The opening chapter chapter of slouching was about Thomas Malthus devil and the conflict between food growth and population growth. Keynes and Malthus’ Devil and mentioned together in the same sentence 3 pages later. Keynes is then mentioned a few more times.
If the desire is for a more thematic analysis, then a meta-layer might be created based on a the word embeddings and the position array. For example for every position group, the top might have a query of the top 3 or 5 keywords and a score is calculated for a group relevance. Essentially this would be scoring an entire sentence or paragraph. So clusters of closely related ideas might indicate an important theme. From the theme, the query words can be recovered by drilling backwards to the embedded scores for each word in the decoding phase.
So my original academic background is in natural language processing. I wrote some very primitive statistical language models, and I worked through much of the '00s at Motorola, on predictive text entry software. (All of my Motorola patents got sold to Google at some point, so whenever you say "damn you, autocorrect!" to an Android -- sorry!) I studied at Johns Hopkins in the back half of the '90s, under Fred Jelinek, who's widely regarded as the originator of the revolution in speech recognition and text processing, replacing attempts to code formal grammars with abstract statistical models.
So I have some rudimentary understanding of what these modern much-fancier statistical chat-bots are doing, and I feel fairly confident in saying that in terms of reaching "artificial general intelligence", they are a dead end.
Boston Dynamics' terrain navigating robots are a much more promising avenue of research. AGI, if it ever emerges, is going to need to have some kind of grounding in reality. It needs first a model of the world, and a model of itself in relation to the world. Building from that, you'd want to gradually generalize its model of itself, to recognize that some of what's in the world is other agents like itself -- so then you get theory of mind. (In this area, AIs that control avatars in games -- where they deal with a virtual world, and other agents, and try to achieve goals -- are potentially a relevant area of research. We've already seen some transfer of code for efficient control loops from gaming, to controls for drones, self-driving vehicles, etc.) Even the most primitive animal has something like this -- prey animals are constantly trying to detect predators in the environment and predict their attack vectors, in order to escape.
Once you have an AI that is reality-grounded, _then_ you can start to layer on language, teaching it how to explain itself, and understand the intentions and requests of other agents. (Though of course this raises all kinds of interesting questions around the alignment problem. These explanations could be false both for reasons of willful deception -- the AI becoming trained to tell you what you _want_ to hear, regardless of the truth -- and just because explaining motivation is hard. _Humans_ are quite bad at understanding their own motivations, let alone interpreting other people's.)
Without attachment to reality, a language model has no way of understanding what is true or useful. As you say, it lacks any kind of grounding in a theoretical framework. It's just producing words that sound vaguely similar to its source material. It's Frankfurtian bullshit.
The hope is that you can, via proper prompting, send it to a corner of the training data space that say true things about the subject matter of the question asked...
Sure, but as long as there is nothing in the model itself that represents the difference between "true" and "truthy", you're always stuck with that kind of prompt engineering. It's not artificial intelligence, it's artificial smart-sounding-ness.
I rather like the joke that it's "mansplaining as a service".
MaaS is very good...
In somewhat old-fashioned terms, bare syntax vs. syntax+semantics.
A point which at some level seems to have been fairly well understood about 50 years ago (in the computer community, specifically, much longer elsewhere).
I don't follow what's being done but I assume something is still being done on the alternate pathway. Though if you want semantics you can't just throw a large dataset at it and hope that something will emerge (doesn't work too well for people either, past the language acquisition/visual analysis phase).
We do know how to do semantics, theoretically.
On the other hand, I've seen students over their heads in class because they're stuck in a mode of trying to answer questions without building a semantics for the subject matter. Which can work for a very long time in a traditional academic setting; but they do hit a dead end.
What I think is that the ChatBot does not have a theory of mind for you, it has only a theory of text. So if the response to the prompt "Why was humanity so poor back in the long Agrarian Age?" does not include the word "Malthusian", that is because whenever the text it was trained on - presumably Slouching Towards Utopia? - says that humanity was poor back in the long Agrarian Age, it does not usually mention that this was a Malthusian condition. When you wrote the text, you were thinking "Malthusian", because you have a theory about the Agrarian Age; and when I read that text, I was thinking "Brad is thinking Malthusian", because I have a theory of mind for Brad DeLong. But that's the difference between ChatBot and me.
Put in the terms you use below, the ChatBot can go to a corner of its training space, and its space is a very very high dimensional hypercube not legible to the human mind, so this allows it to surprise us; but it cannot create a corner of space where it was never trained.
It seems these modern transformers still have a problem of “forgetting” text and information given at a certain distance.
Yep. The "attention" mechanism helps diminish this problems a lot...
My knowledge of this AI is from cursory reading of the technology. I have a general idea of the strategy and goals of the transformation, though I have no idea how correct I am. But it has lead to some ideas. I wonder what would happen if the hidden database were again sent through the transformer, thus bringing long distance terms even closer together? The idea would be similar to calculus in making derivatives. With an equation representing distance and time, the first derivative yields velocity (d/v). The second derivative yields acceleration (d/v^2). With each calculation, a term is lost. Likewise when making an integration, a constant will need to be added back.
So with this model starting with a paragraph, you have a thought. Transform it again yields a concept. Transform again into a theme. Reverse the process to get an output. This process may yield different types of output across similar disciplines. Of course this may be just my imagination running wild.
I think as you feed it to itself over and over again things become compressed, but after two or three passes you lose coherence pretty completely
I spent some time thinking on this for a bit. I suppose if doing transformations on the meta- and meta-meta-data have been done, then it may be a dead end.
The opening chapter chapter of slouching was about Thomas Malthus devil and the conflict between food growth and population growth. Keynes and Malthus’ Devil and mentioned together in the same sentence 3 pages later. Keynes is then mentioned a few more times.
If the desire is for a more thematic analysis, then a meta-layer might be created based on a the word embeddings and the position array. For example for every position group, the top might have a query of the top 3 or 5 keywords and a score is calculated for a group relevance. Essentially this would be scoring an entire sentence or paragraph. So clusters of closely related ideas might indicate an important theme. From the theme, the query words can be recovered by drilling backwards to the embedded scores for each word in the decoding phase.