Discussion about this post

User's avatar
Philip Koop's avatar

It seems to me that the "bitter lesson" is a particular example of a more general principle, which you might call "make it up in volume crossed with less is more". It underlies the industrial revolution, for example: early machine-made artifacts were usually inferior to their hand-made counterparts, but they were so vastly cheaper that one could have many, many more machine-made artifacts than hand-made. The technology has since developed to the point that machines can make artifacts that we could never duplicate by hand.

The particular example that fascinates me, even more than AI, is the alphabet (the subject of the Logarithmic History post for 27 September.) Doug Jones:

"To any Egyptian scribes who saw it, this script must have looked like a laughably dumbed-down version of hieroglyphs, stripping signs of their meanings, then painfully spelling out with a whole string of characters what hieroglyphs often managed in a single sign or two."

[...]

"Logographic writing systems like Egyptian hieroglyphs were devised multiple times: in Mesopotamia, China, and Meso-America. The alphabet however, was invented just once."

Just because an idea is simple and destined for outstanding success doesn't mean it is easy or obvious.

Expand full comment
Alex Tolley's avatar

Gary Marcus was gleeful that Sutton has revoked the idea that "infinite scaling" will solve all AI (LLM) issues. While true for the LLM approach, that doesn't mean that scaling won't work for a different AI approach.

Consider a much simpler technology, expert systems, and decision trees. Rule-based systems, common in the symbolic AI era, started with hand-crafted expert systems. They proved brittle and failed to capture the nuances. The result is that they went into the dustbin of AI techniques, although they are still the core of 1980s books on Prolog. Then, class separation became the focus, from "What decision to take [yes/no] and [which choice]" and object classification [in/out of class] and [which class?])" The ID3 and later CART algorithms were core to a machine learning approach that leveraged the computational effort of computers. Provide a table of variable states/values, with the outcome/classification, and computers would very quickly create a pruned Decision Tree. It wasn't very fast with old Intel 8088 CPUs, but today, a decent PC can produce a result in less than a second for tables of 100s of rows and 10s of variables. But DTs were still somewhat brittle. But then increased computational power allowed the DT algorithm to be used with altered input tables (leave out some variables, or table rows, repeat the algorithm many times, and use the majority classification results. This is called a Random Forest. This proved far better with performance that was still very fast, and with results more comparable with ANNs which remained computationally heavyweight. RFs still remain my favoured ML approach for a quick result to determine if the data has structure. It also reminds me of Calvin's "The Cerebral Code" on how the brain (cerebrum) works.

But clearly, Random Forests have limits on problems they can solve. Multilayer Perceptrons (ANNs) are similar. Great at classifying, but not at other tasks. Transformer and Attention architectures allowed the basics of Large Language Models (LLMs), enabling computers to handle the use of language. They have proven remarkably versatile in solving (at least partially), some problems that were unexpected. But now scaling has shown that further improvements are at the upper bounds of the logistic curve of the technology's problem solving. As Marcus opines, LLMs do not understand the world, and never will. This doesn't mean scaling is no longer generally operative, just that scaling will no longer work for this particular technology. LeCun has already indicated that we need a new AI technology to go further/ hassibis, focusing more on scientific problems wants more neurosymbolic hybrid technology. We don't yet know what the next breakthrough will be, but when we get it, scaling will likely start the next logistic (S-curve) of gains.

Expand full comment
3 more comments...

No posts