Discussion about this post

User's avatar
Dennis Clark's avatar

At the peril of missing the point, which model did you use? I gave the question to opus 4.6, which nailed it. Asked to cross-reference with a source, it did that too. More generally you’re right of course that the right answer only ever comes out by semi-controlled happenstance, which is why all the agentic loop things that work are reliant on external judgments of correctness (did the function run? Etc) to iterate on.

alfgifu's avatar

I spent an hour or so yesterday trying afternoon to talk copilot into summarising a set of articles from eight organisations about the same policy announcement; I gave it links to the articles.

Where it accessed the correct article it was able to produce concise summaries and sentiment analysis that was reasonably solid; it is good at drawing out key patterns from text and presenting them back in a grid. HOWEVER. Every damn time it hallucinated something. I know these organisations and had direct access to the articles; I also had it attaching links to sources, so I could see when it opted to disregard a link I'd provided and based its analysis on commentary from some other article. Once or twice it told me confidently that an organisation hadn't published a reaction and provided summaries of the policy itself from the government webpage, or hypothesised their reactions based on previous summaries.

Eventually I wrestled it into submission, sharing the links in smaller batches, asking for four organisations at a time, and finally copying the text of the last article directly into copilot since it seemed impossibly wedded to referring to a truncated linkedin summary instead (!)

This was all *extremely* irritating, but by the of the hour I had a concise table which did indeed provide a useful overview of what eight key organisations had said, in their own words and from their own websites, about the policy announcement. Writing the same thing myself would have been a MUCH pleasanter experience, but would have taken me three times as long.

Long enough, to be honest, that it would have been at the end of my list of things to do and I might well not have got round to it; I would have shared on the links to the articles to colleagues, most of whom wouldn't have had time to read them.

All of which is to say, if you understand the limits of the tool and work round them, it certainly does make it possible to do more. But UGGHHHH I do not enjoy the experience. And I do wonder how much of the strength of feeling against AI from a lot of creative or intellectual types is just this kind of irritation at finding tasks that were mildly enjoyable become a wrestling game with a super powered toddler.

5 more comments...

No posts

Ready for more?