The Unsolved Puzzle of Apple Computer's "AI" Misadventures Over the Past Year
Not that I have any answers. But I do have lots of questions. And there are interesting tidbits of information out there, as we contemplate the inability of Apple Computer’s Siri to make any...
Not that I have any answers. But I do have lots of questions. And there are interesting tidbits of information out there, as we contemplate the inability of Apple Computer’s Siri to make any significant progress at threading the labyrinth of “AI”. Is there more to the story than secrecy, institutional sclerosis, & managerial overconfidence resulting in missed deadlines, internal confusion, & top executives depriving themselves of situational awareness?…
I do not know whether Mark Gurman and Drake Bennett’s source here thinks he is making Apple Computer’s Senior Vice President of Software Engineering Craig Federighi look good or is knifing him in the back here:
Mark Gurman & Drake Bennett: Why Apple Still Hasn’t Cracked AI <https://www.bloomberg.com/news/features/2025-05-18/how-apple-intelligence-and-siri-ai-went-so-wrong>: For the Siri upgrade, Apple was targeting April 2025…. But when Federighi started running a beta of the iOS version, 18.4, on his own phone weeks before the operating system’s planned release, he was shocked to find that many of the features Apple had been touting—including pulling up a driver’s license number with a voice search—didn’t actually work…. (The WWDC demos were videos of an early prototype, portraying what the company thought the system would be able to consistently achieve.) The planned rollout was delayed until May and then indefinitely, even as the features were still being promoted on commercials…
In some ways it is worse than Gurman and Bennett said: they do not remind their readers that the vibe from the commercials was not that one of “these features are coming soon” but, rather of “these features of Apple Intelligence are here now!”
And if whoever Gurman and Bennett’s sources are think that they are doing Federighi a favor by portraying him as so lacking in situational awareness that he has to open a newly released beta on his own device in order to learn that tentpole features are not shipping—that V1 is unshippable, and V2 is nowheresville—then I question their sanity.
Given that Apple Computer has been, for nearly a decade, shipping versions of Apple Siri that are far behind the competition and not up to anything one might think of as Apple Computer quality standards, it is hard to see the current Apple Computer party line that “we weren’t able to achieve the reliability in [the New Siri with its V2 architecture] in the time we thought…” as anything other than euphemism.
This is particularly so given that there seems to have been a substantial amount of schizophrenia among Apple’s leadership with respect to AI a year ago. On the one hand, back then they recognized, as Craig Federighi put it to Joanna Stern earlier this week, that “when it comes to automating capabilities on devices in a reliable way, no one’s doing it really well right now...” On the other hand, against this background, Federighi also says “we wanted to be the first. We wanted to do it best…” Moreover, they thought that they could do so. That meant producing, in nine months, without very much of a past runway, and without giving John Giannandrea’s AI team the GPU chips they thought that they needed, a context-aware agent-chain AI system that was beyond the frontier that any of the companies spending much more on the problem had managed to create. Perhaps the key problem is that Apple’s historical strength has been its insistence on “it just works”, reliably. But if we know one thing about generative AI with all its hallucinations, very recalcitrant edge cases, and inability to reliably shift from interpolation to extrapolation, the last thing it can do is “just work.
And here I am of two minds. On the one hand, this would seem crazy without having had truly extraordinary breakthroughs substantially more than a year ago. On the other hand, as John Gruber has written: “Apple’s executives aren’t crazy…” They announced that they could do it at the 2024 WWCD, and, again quoting Gruber:
Multiple trusted sources… [n]ever saw an internal build…[of] this [as of the 2024 WWDC]…. They don’t believe there was one, because… they believe that if there had been such a build, their teams would have had access to it. Most rank and file engineers within Apple do not believe that feature existed…. The first any of them ever heard of it was when they watched the keynote with the rest of us on the first day of WWDC last year. [BUT] I’m quite certain Apple’s executives believed…. Crazy to announce… if they didn’t believe they could ship it…
So just how much AI-overpromising did Apple Computer do at its 2024 WWDC keynote? And how did the Apple executives convince themselves that their overpromising was only at the margin? And why?
Lots of questions. I do not have any good answers. I do, however, have some things that strike me as highly informative:
John Gruber writes:
John Gruber: Apple’s Spin on the Personalized Siri Apple Intelligence Reset <https://daringfireball.net/2025/06/apples_spin_on_the_personalized_siri_apple_intelligence_reset>: The whole “Siri, when is my mom’s flight landing?” segment of last year’s WWDC keynote… was never demoed…. The keynote video didn’t show the actual feature working. It kept cutting away from the iPhone that was purportedly performing the feature back to presenter Kelsey Peterson…. Apple’s internal rules for keynote demos is that the entire feature has to be real, and capturable in a single take of video…. They’ve got really strict rules about everything being real. That doesn’t mean they always show the feature in a single take in the final cut… but it has to be possible….
But [in] that Siri demo in last year’s keynote…. There’s not one single shot in the whole demo that shows one action leading to the next. It’s all cut together in an unusual way for Apple keynote demos. Go see for yourself at the 1h:22m mark….
Multiple trusted sources… [n]ever saw an internal build… that had this [then]…. They don’t believe there was one, because… they believe that if there had been such a build, their teams would have had access to it. Most rank and file engineers within Apple do not believe that feature existed…. The first any of them ever heard of it was when they watched the keynote with the rest of us on the first day of WWDC last year.
I’m quite certain Apple’s executives believed…. Crazy to announce… if they didn’t believe they could ship it, and Apple’s executives aren’t crazy. I’m also quite certain that eventually there was a functional implementation of the now-abandoned “v1”… but it was unreliable with no path forward to make it reliable. (I think it was far worse than “not up to Apple’s high standards”)…
The transcript of the relevant part of the WWDC keynote is this, from Apple Computer’s Director of Machine Learning and AI:
Kelsey Peterson: WWDC 2024 Keynote <https://developer.apple.com/videos/play/wwdc2024/101/>: ‘I want to show you one more demo that will give you a sense for how powerful Siri will be when it draws on the personal context awareness and action capabilities built into Apple Intelligence.
Imagine that I am planning to pick my mom up from the airport, and I'm trying to figure out my timing. Siri is going to be able to help me do this so easily. “Siri, when is my mom's flight landing?” What's awesome is that Siri actually cross-references flight details that my mom shared with me by email with real-time flight tracking to give me her up-to-date arrival time. “What's our lunch plan?” I don't always remember to add things to my calendar, and so I love that Siri can help me keep track of plans that I've made in casual conversation, like this lunch reservation my mom mentioned in a text. “How long will it take us to get there from the airport?” I haven't had to jump from Mail to Messages to Maps to figure out this plan. And a set of tasks that would have taken minutes on my own and honestly probably would have resulted in a call to my Mom could be addressed in a matter of seconds.
That's just a glimpse of the ways in which Siri is going to become more powerful and more personal thanks to Apple Intelligence.
And all of these updates to Siri are also coming to iPad and Mac, where Siri's new design is a total game-changer. It makes Siri feel seamlessly integrated with your workflow. Thanks to the capabilities of Apple Intelligence, this year marks the start of a new era for Siri. Here's Justin to show you more places throughout the system where Apple Intelligence simplifies and accelerates your tasks…
As I wrote last week,
this year’s version of Apple Computer seems to be much more firmly based in reality. Rather than pushing as its tentpole feature a beyond-the-frontier context-aware agentic chatbot, instead it is highlighting things like
opening up its on-device foundation AI models to third-party developers…
thus allowing them to build private, intelligent features into their own apps…
focusing on platform-level AI integration…
focusing away from chatbots-as-interface-for-everything…
focusing on AI as an embedded, context-sensitive layer across all platforms…
and doubling down on privacy, on-device processing, and seamless integration…
thus recognizing that no single firm—even one as mighty as Apple—can innovate alone, for the platform, not the product, is the principal locus of value creation…
And here we have Craig Federighi and Greg Joswiak explaining themselves to Joanna Stern:
Stern: Last year, you announced a smarter AI-driven Siri. Where is she?
Federighi: We had a really two-phase plan: two versions of an architecture to deliver a great Siri. And as we got into the [2024] conference, we had V1 working to do basic capabilities that we showed off at the conference. So we had some real software we were able to demonstrate there and show what was coming. But it didn’t converge in the way, quality-wise, [in the way] that we needed it to. We had something working, but then, as you got off the beaten path, and we know with Siri, it’s open-ended what you might ask it to do and the data that might be on your device that would be used in personal knowledge. And we wanted it to be really, really reliable. And we weren’t able to achieve the reliability in the time we thought.
Stern: But there was a working version of this? This wasn’t just vaporware?
Federighi: Oh, no, no, no, no, no. Of course, no. We were filming real working software with a real large language model, with real semantic search. That’s what you saw.
Stern: Okay.
Joswiak: There’s this narrative out there that it was demoware only. No, it was again something we thought, as Craig said, [we] would actually ship by later in the year. Look, we don’t wanna disappoint customers. We never do. But it would’ve been more disappointing to ship something that didn’t hit our quality standard, that had an error rate that we felt was unacceptable. So we made what we thought was the best decision. I’d made it again.
Stern: It’s great that you set this high bar, but you’re also Apple. I mean, you’ve got more engineers, more cash than most companies, maybe any company. Why couldn’t you make it work?
Federighi: This is new technology. When it comes to automating capabilities on devices in a reliable way, no one’s doing it really well right now. And we wanted to be the first. We wanted to do it best. And like I said, we had very promising early results and working initial versions, but not to the level. As we began living on it internally, [we began] feeling like, “This just doesn’t work reliably enough to be an Apple product.” So this stuff takes hard work, but we do see AI as a long-term transformational wave, as one that’s going to affect our industry and, of course, our society for decades to come. We wanna get it right. There’s no need to rush out with the wrong features and the wrong product just to be first.
Stern: So many people associate Apple and AI with Siri since plus 10 years ago now. And so there is a real expectation that Siri should be as good, if not better, than the competition.
Federighi: Oh and I think, ultimately, it should be. That’s certainly…
Stern: But it’s not right now.
Federighi: That’s certainly our mission. That’s our mission. We set out to tell people last year where we were going. I think people were very excited about Apple’s values there, an experience that integrated into everything you do, not a bolt-on chatbot on the side, something that is personal, something that is private. We started building some of those and delivering some of those capabilities. I in a way appreciate the fact that people really wanted the next version of Siri, and we really wanna deliver it for them. But we wanna do it the right way.
Stern: When’s the right way gonna come along?
Federighi: Well, in this case, we really wanna make sure that we have it very much in hand before we start talking about dates for obvious reasons.
Stern: And will that include these features that you had previously announced and more? Is this the effort to make Siri this more interactive AI companion?
Federighi: Look, on the one hand, I would love to dish about my enthusiasm for our future plans. But that’s exactly what we don’t wanna do right now.
Stern: Makes sense.
Federighi: Misset expectations. We wanna deliver something great that you and all of our customers really appreciate.
Stern: You have mentioned Apple Intelligence a lot, and you know, to be honest, I’m not really a big user of Apple Intelligence. I’m using a lot of your competitors’ products. Can you or will you keep up with that competition?
Joswiak: It’s important to realize our strategy’s a little bit different than some other people. Our idea of Apple Intelligence is using generative AI to be an enabling technology for features across our operating system. So much so that, sometimes, you’re doing things you don’t even realize you’re using Apple Intelligence or, you know, AI to do them. And that’s our goal, integrate it. There’s no destination. There’s no app called Apple Intelligence, which is different than like a chatbot, which again, what I think some people have kinda conflated a bit. Like, “Where’s your your chatbot?” We didn’t do that. What we decided was that we would give you access to one through ChatGPT, because you know, we think that was the best one. But our idea is to integrate across the operating system, make it features that, you know, I certainly use every day.
Federighi: AI is one of those massive technological waves, like the internet, like mobility. When you look at the internet, I don’t think anyone was saying, “Gosh, Apple, I find myself using amazon.com, and I use that a lot. Why don’t you have one of those?” “I find this web search thing really useful. I’m enjoying, you know, streaming cat videos. I find this useful. Why is this not in your product?” Well, of course, the internet was vast. It was opportunity for many, many companies, for users to do a wide diversity of things. It was also a huge enabler for Apple, and I think Apple made the internet accessible in a lot of ways more than anyone. And it was super empowering for our customers and for our products. But that didn’t mean that every experience that you might take on necessarily is gonna happen inside of Apple or ultimately happen with Siri.
Stern: Between you both, you have, I believe, around 60 years of experience at this company. You’ve seen the company go through highs, through lows. Where do you think you are right now? There’s a lot of sentiment that Apple is on its back foot here.
Federighi: I think you’re right to bring up that perspective, because I think when you have been through different waves, you’re very accustomed to the ups and downs. And I think we’re feeling really good right now.
Joswiak: I hate to be naive, but I remember Steve [Jobs] had come back, and he told us, “Look, what we have to do is create great products and tell people about ’em. And if we do that, everything else will work out.” And it turns out, that is kinda the case. And we create great products, and we think our products are exceptionally great right now and keep getting better. I’ll hang onto that naive theory that if we build great products, we tell people about ’em, everything else will work out…
The party line that quality and reliability take precedence over speed to market—even if it means ceding ground to competitors like Google and OpenAI in the interim—is a very good one. But it does have at least one significant problem, at least as Federighi and Joswiak lay it out: That problem is that Apple Computer has been shipping Apple Siri, which has not for a long time been an Apple-quality product. The consensus is that Apple Siri’s moment when you could claim it was punching equally with Amazon Alexa ended in 2016. And yet it continues to praise and ship Siri as it is, without it having become (before now) the five-alarm internal fire response it should have become at least five years ago.
Indeed, Apple has been behaving relatively oddly over the past five years or so. Its extraordinary silicon hardware renaissance aside, Apple over the past five years sends out less of the vibe of “make great products” and more of “grab for every dollar of services revenue even though it makes our platforms worse” and “lock things down so we never again become as dependent on anyone as we were dependent on Microsoft and Adobe back in the day”.
I think the problem is that LLMs are less than meets the eye. Apple got punked. LLMs are very good at impressing the uninformed and that includes upper management. As Cory Doctorow noted, they don't have to be good enough to do your job, they just have to be good enough to convince your boss that they can do your job.
It's very easy to imagine a v1 that was tuned to work on one battery of tests to the point where the engineering management involved was convinced that just another six to twelve months of tuning would get something working more generally. Unfortunately, programming LLMs is not like writing code. There are lots of people with a vague sense of how far along software is and how close to meeting goals. They're usually wrong in detail, but often close enough to estimate ship dates. The joke is that the best way to improve such skill is to multiply your first estimate by your age.
LLMs are another matter. These systems are not transparent and not robust. Remember when the big thing was tricking computer vision systems by putting small stickers on stop signs to convince them that a firetruck was blocking the way. LLMs are all too similar in operation, so v1 was a lot farther from release than anyone at Apple thought. They found out the hard way as so many others will.
A day late but here are my thoughts about what Apple is undertaking in trying to get Siri up to snuff and why it’s a hard problem to solve:
1) Apple doesn’t just want a reliable LLM interface with their products, such as what Microsoft is doing with copilot. They also want a reliable voice interface and that’s a whole other set of challenges that no leading AI company has solved yet.
2) Apple wants their AI enhanced Siri to work *offline* and that’s a significant challenge. Perhaps even more difficult than they were expecting.
2a) Using any generative AI at full capability right now requires an entire farm of servers with thousands of nvidia chips networked in just the right way with full access to reams of training data. That’s a lot of overhead to make every little query and comment work.
2b) You can run simplified AI models locally on your PC but it requires you to have an expensive nvidia GPU, often a part that consumes more than 300wats under load and has a physical footprint measured in tens of inches/centimeters. On a single chip with a smaller set of data, capabilities are extremely… modest.
2b1) I mention nvidia chips because nobody has an alternative to their chips now or in the immediate future. Apple’s chip designs are amazing but they were not developed for the work of generative AI. Running a local LLM on your iPhone just isn’t possible with apple’s current chip architecture and they can’t simply copy nvidia because, yes, patents but also because nvidia is not efficient enough for mobile, and because Apple doesn’t want to be dependent on anyone else.
3) the networking required to synchronize an LLM’s operations across a server farm and a local device with different chips and different capabilities and different latencies and data lost to cellular or WiFi conditions is a nontrivial challenge but Apple’s vision for AI Siri requires solving that too.
4) we haven’t even gotten to the software running Siri and all the little agentive interfaces into every app and program on your iPhone or your Mac.
So, Apple basically needs to be the first company to solve several different but related hard problems in order for Siri to function in the way they envision. They need an in-house AI chipset that is efficient enough for mobile because Siri needs to be able to do some “thinking” on your local device, potentially offline. That chipset, when online, needs to function well with the large server farms, something nobody else is trying to do at the moment afaik. There’s a reason all of the compute is being done in server farms and not distributed across everyone’s individual devices! Oh, and they’ve got to get a class-leasing voice interface off the ground so Siri properly understands people of varying accents, languages, and blood alcohol content.
I have no idea why anyone at Apple thought they would solve all of this in a matter of 12-18 months! I suppose they underestimated the challenges involved.