Beyond the SmartPhone? Or an Actually Functional Siri without a Screen?: The Altman–Ive Ambition Is Going to Meet Physics, Compute, & Etiquette.
I smell a “welcome to the NFL” moment coming for this piece of OpenAI’s ambitions. But I also know that it is much too early and information is still much too thin for speculation to be worthwhile...
I smell a “welcome to the NFL” moment coming for this piece of OpenAI’s ambitions. But I also know that it is much too early and information is still much too thin for speculation to be worthwhile. A palm-sized, “always-on” screenless device sounds liberating—until battery limits, privacy needs, & cloud bills arrive. It will be very difficult to beat the slab on speed and certainty, & after that you still face the hurdles on behavior change and trust. To me, at least, the only possible victory looks like an info-butler with on‑device reflexes, not a talkative “friend” with a personality…
When I first read it, I thought there really was no point to Tim Bradshaw and company writing this story.
Yes, Jonny Ive and Sam Altman’s teams are probably doing something very interesting. But the reporters could not uncover enough meat for it to be worthwhile yet for us to analyze what their plans are likely to produce next year.
But, still, I found it too interesting to resist, and I could not stop thinking about it over the past week:
Tim Bradshaw, Cristina Criddle, Michael Acton, & Ryan McMorrow: OpenAI and Jony Ive grapple with technical issues on secretive AI device <https://www.ft.com/content/58b078be-e0ab-492f-9dbf-c2fe67298dd3>: ‘OpenAI and Ive are seeking to build a more powerful and useful machine. But two people familiar with the project said that settling on the device’s “voice” and its mannerisms were a challenge. One issue is ensuring the device only chimes in when useful, preventing it from talking too much or not knowing when to finish the conversation…. “The concept is that you should have a friend who’s a computer who isn’t your weird AI girlfriend . . . like [Apple’s digital voice assistant] Siri but better,” said one person who was briefed on the plans. OpenAI was looking for “ways for it to be accessible but not intrusive”. “Model personality is a hard thing to balance,” said another person close to the project. “It can’t be too sycophantic, not too direct, helpful, but doesn’t keep talking in a feedback loop.”… OpenAI’s device will be entering a difficult market…
And Adam Tooze asks:
Adam Tooze: Top Links 888 <https://adamtooze.substack.com/p/top-links-888-the-vaccine-paradox>: ‘Is the concept behind the multi-billion Ive-Altman tie up really all about “no more weird digital girlfriends”… and no “snarky friends” either? Is that it? Some kinda nicely balanced masculinity?…
I am not quite sure what that last word is doing there. It is lifting off from the person “briefed on the plans” who claimed that the desired product would present itself as “a friend who’s a computer who isn’t your weird AI girlfriend…” The big questions are: What is meant by “friend” here? Why not just a SubTuring info-butler with a natural-language interface? And why “masculinity”?
And then there is the regrettable truth that I was right in my first reaction: there really is no point to thinking about this because the reporters could not uncover sufficient meat.
Nevertheless…
The underlying hardware technology of this first in a series of devices is presumably a not-phone that sits in your pocket, around your neck, or on your wrist: A palm-sized, screenless, “always-on” not-phone ambient assistant. It listens to what you say and hear, ideally sees what you see, watches what your smartphone does, and is ready to respond with the right information when you say “computer…” but then also do a little bit more—but not overtalk, and only intervene where people will be happy that it decided to speak up.
That is, given technological capabilities right now, a very heavy lift indeed. And it seems to me that trying to make it a “friend” rather than an info-butler is making your lift much heavier than it has to be. Conversational “personality” and turn-taking are nontrivial. Avoiding sycophancy or endless loops is an open UX problem. An info-butler is a much more easy role to fill.
And here I should put the rest of my notes and musings behind the paywall, because in a year or so I will probably be embarrassed by them. Why? Because I do not see a road to success here. And yet Ive’s and Altman’s teams do, and they are smarter than I am.
The device aims “Siri but better”.
That should be easy. That is, right now, an incredibly low bar.
Yet without a screen even as a fallback, they have raised interaction design complexity nearly to the max. And God alone knows how battery interacts with “always-on” data collection where the data should be kept on-device for privacy reasons and all of this is processed by a GPT LLM MAMLM that has much more power than merely linguistic fluency interfacing with websearch and similar. And the required hardware power and performance per watt—aggressive hiring from Apple/Facebook shows intent to build a high-end hardware org fast, but there seem to be no chips that could legitimately even try to power this thing outside of Apple’s M-series.
We know something about Altman’s pitch to OpenAI staff: ship 100 million AI “companions,” positioned as a “third device” alongside phone/laptop, aware of surroundings and day-to-day experiences:
Ambition: 100M units shipped “faster than any company”—a manufacturing/logistics moonshot.
Role: complements phone/laptop as a distinct ambient companion, not a replacement.
“Aware of surroundings” implies persistent sensing and context memory.
Speed is part of the strategy signal—market capture before incumbents’ ecosystems adapt.
Design leadership (Ive) is a branding wedge against utilitarian AI hardware stigma.
Cloud-first “magic intelligence” locks product economics to datacenter build-out.
The strategic message is speed-to-scale and everyday embeddedness. Not replacing smartphones. It will do the small things that pulling out your smartphone creates too much cognitive, latency, and social friction to take care of. And yet relying on the cloud and its processors are, I think, an Achilles’ Heel—or would be if Apple were on the ball with respect to Siri as well as with respect to on-device privacy.
Remember Apple’s Newton? It was the iPhone, 14 years early. That’s seven Moore’s Law generations. A 20MHz ARM 610 vs. a 412MHz ARM1176JZF-S paired with a PowerVR MBX Lite GPU. The iPhone had oughly 80 times the performance, drawing twice as much power but with twice as much battery. And it was a phone rather than a (small) tablet
Consider the trio of devices that defined 2024’s hype-to-hangover cycle: Rabbit R1, Humane Ai Pin, and, in its own way, Apple’s Vision Pro. It feels to me the same—visions that are, right now, still out of reach of the hardware, both on-device and through the ether.
Rabbit and Humane, remember, tried to sell ambient, screenless assistance as liberation from app-choked phones. In practice, both shipped as undercooked companions that added friction rather than removing it. Rabbit’s “Large Action Model” promised to operate apps on your behalf, but early versions barely did anything beyond a thin Android app veneer; integrations were brittle, latency long, and security suspect. Humane’s lapel pin bet everything on voice plus a hand-projected laser UI. It was expensive upfront, taxed monthly, overheated, and struggled to authenticate or recognize reliably. The result of dropping the screen was not less cognitive load, but more: speaking to a device, waiting, clarifying, and hoping the cloud would oblige.
The Vision Pro suffers from the tyranny of latency (wireless, clouds, tracking), battery tethers, and social acceptability. For those seeking immersion, a $300 Quest was good enough at one-tenth the price.
Underneath these misfires sit the three structural constraints of:
Latency and reliability. If an assistant routes actions through the cloud with long tails of delay and failure, it is worse than the phone-in-hand. As Brownwyn Hall says: never push bytes down a wire or over the ether when you can avoid it; users punish roundtrips severely.
Unit economics of inference. “Always-on” sensing plus large-model inference is expensive. If the business model is monthly fees to subsidize server farms, adoption will stall unless the value delivered is daily and undeniable.
Behavior change cost. The smartphone won because it minimized context-switch costs: glance, tap, done. Devices that demand voice-only interaction, or eliminate the fallback screen, impose higher cognitive and social friction. Most people will not pay more to do less, more slowly.
A viable path likely looks less like a screenless talisman and more like three pieces in concert:
on-device models for ultra-low-latency tasks (speech, vision, intent),
judicious cloud escalation for heavier lifts, and
UI affordances that let users audit, interrupt, and override—i.e., an info-butler, not a “friend.”
If ambient assistants are to succeed, they must beat the smartphone at specific margins: zero-friction capture, context-aware reminders, hands-free micro-automations, and private, on-device memory. Otherwise, we have ornaments that narrate our day while we still reach for the slab to get real work done. The market has given its verdict on the first wave: don’t try to replace the phone; outcompete it exactly where it is worst.
To me, that does not look doable.
Yet.




I don’t understand the problem Ive & Altman are trying to solve, much less whether they can actually solve it. It seems managing the device would require more time & effort than going directly to the task.
Not long ago I saw a review of Apple’s new realtime translation software. Why should I rely on it? At present I cannot rely on Apple’s software to render accurately an English voicemail as English text.
I hereby nominate enshi**ification as word of the year.
I do not like this “thing”.
Why do I want “always on ?”
As you say, the latency problems mean that it is not really always on.
Even if they resolve the latency / battery problem, the security issue looks menacing and will invite all sorts of things … the lapel pin trading crypto or gambling on the NFL without the user exactly knowing because the default setting requires opting out of many obscure bits ?