4 Comments
User's avatar
Kaleberg's avatar

There are two components to a message, the carrier and the signal. LLMs definitely get the carrier right. What they produce is almost always in the correct form. They often get the signal right, but all too often they get the signal wrong. The problem is that it takes time and work to verify any answers.

Brad DeLong's avatar

The fact that LLMs produce linguistic fluency is itself a meta-signal of their trustworthiness, which is dangerous. As I say, you poke them into a subset of their training data where the information is reliable because testable and tested—cough, programming, cough—and you are likely to be on rock-solid ground. (And, if not, you are likely to discover that you are not, as things either fail or produce totally implausible results.) But give them something that has a single, certain, correct answer, and their lack of any plausible world model leads to epic fails:

> **Kaleberg**: There are two components to a message, the carrier and the signal. LLMs definitely get the carrier right. What they produce is almost always in the correct form. They often get the signal right, but all too often they get the signal wrong. The problem is that it takes time and work to verify any answers...

My current favorite epic fail remains this:

* **DeLong, J. Bradford**. 2026. "MAMLMs Still Epic Fail Open‑Book, Closed‑World, Finite‑List, Obvious Ground Truth Tasks". _DeLong's Grasping Reality_. Feb 24. <https://braddelong.substack.com/p/mamlms-still-epic-fail-openbook-closedworld>: 'A failure nine different successive times. There really is a unique, well‑defined answer, and where the machine has every chance to uncover it: the Hedge Knight Ashford Meadow line‑up. This avoids the usual escape hatches about “the data might be ambiguous” or “this is a hard open question.” We get not one isolated “hallucination” but rather a hallucination cascade…

(And I have not even gotten to its inability to transfer information about what the knights' devices should look like from text to image: a Baratheon Stag breastplate **and** a not-Targaryen one-headed not-four-but-two-legged dragon on your shield? Come on! AGI this ain't.)

======

Subscribe to <http://braddelong.substack.com/subscribe>

paul wolfson's avatar

A pedantic point. The section titled "Making this sound like Marx's Capital" includes the following relationship:

Intellect:Intelligence :: Labor:Capital :: Dead labor:Living labor

I think the middle fraction is inverted, at least relative to the last one. IIRC, Capital is Dead Labor while Labor is Living Labor.

Cosma's avatar

Oh dammit, I thought I'd fixed that.