AI Agents & The Solopreneur
Posts
When Your AI Agent Stops Speaking English

When Your AI Agent Stops Speaking English

(And Why That's Actually Amazing)

Fabian Zntl
March 03, 2025

"Hey Google, are you married?”
“Hey Google, where is your body?”
“Hey, can you hear me???"

I couldn't help but smile watching my kids interact with our Google Assistant this week. There's something fascinating about observing how they intuitively learn what these digital entities can (and definitely cannot) do.

And trust me, I’ve learned from experience that you should always be careful who should be allowed to order stuff with Alexa or set the alarm with Google.

The Voice Assistant Puzzle 🤔

We've all been there. You're cooking with messy hands, driving, or just feeling lazy, and you ask Siri, Alexa, or Google to do something seemingly simple—only to be met with:

"I'm sorry, I don't understand that question."

"I can't help with that yet."

"Here's what I found on the web..." (which is completely unrelated)

It's frustrating, right? These voice assistants annoy us with the promise of a sci-fi future, but then stumble on basic requests that would genuinely make our lives easier.

The ChatGPT Voice Game-Changer 🚀

If you haven't tried ChatGPT in voice mode yet, you're missing out on what voice assistants should have been all along. Next time you're stuck in traffic or have a quiet moment, try this:

"Hi, ChatGPT. I'd love your perspective on personal growth. Could you help me explore ways to find more purpose and balance in my daily life? I'm curious about strategies for setting meaningful goals, developing healthy habits, and staying motivated. Where should we begin?"

The difference is night and day. Instead of the rigid, command-based interactions we've grown accustomed to, you'll experience a flowing, meaningful conversation that actually provides value and insight. It's like jumping from a flip phone to an iPhone—the same category of technology, but worlds apart in experience.

Why Are Traditional Voice Assistants So... Limited? 💡

Why can't Siri be as smart as ChatGPT?

The answer lies in both technology and design choices. Traditional voice assistants were built for speed and efficiency, optimized to handle specific commands and simple interactions using much less computational power. They're designed to work quickly across millions of devices simultaneously.

Large language models like those powering ChatGPT are incredibly resource-intensive. They require significant processing power and have traditionally needed cloud connectivity to function properly. There are also legitimate data privacy concerns—LLMs need to process and potentially store much more of your personal data to maintain context.

But the winds of change are finally blowing...

2025: The Year Voice Assistants Get Smart? 📱

The big tech players are racing to integrate advanced AI into their voice assistants:

Apple has technically integrated ChatGPT functionality into Siri with iOS 18.2 in late 2024. I say "technically" because while U.S. users can access it directly, those of us in Europe are still either waiting or using workarounds. And honestly, it's not yet the full intelligence revolution we were promised—more like putting a fancy engine in an old car.

Google announced plans to supercharge Google Assistant with Gemini's capabilities, though they're keeping the exact timeline close to the chest. What we do know is that Gemini has already replaced Google Assistant as the primary assistant on Samsung's Galaxy S25 series since January. Plus, Google just rolled out the Google Home extension in the Gemini app, bringing more conversational control to smart home devices.

Amazon made the biggest splash last week with Alexa Plus—their long-awaited generative AI assistant designed to eliminate those frustrating limitations of traditional voice commands. The new system shall be able to order groceries, send event invites, and remember your personal preferences on everything from dietary restrictions to movie genres. It's free for Prime members and works with most existing Alexa devices. Early access kicks off this month.

Amazon announced Alexa Plus

My Voice-Powered Productivity Hack 🔊

While waiting for these next-gen assistants to fully mature, I've been building my own workarounds. My current favorite is an iPhone voice memo shortcut that transcribes my spoken thoughts directly into my Notion inbox. I can verbalize ideas while walking or driving, and then ChatGPT helps structure those thoughts and distill action points automatically.

While it could be possible with Shortcuts only I use actually 2 automations for it as it’s a pretty straightforward setup with Zapier.

This blend of voice input and AI processing has boosted my productivity tremendously—only a glimpse of what fully mature voice AI systems might enable in the near future.

The Voice-First Future Is Coming 🌐

Controlling our digital world through conversation will become increasingly powerful and natural. The companies that nail this interface will reshape how we interact with technology entirely.

I'm particularly excited about the business opportunities emerging around voice and will keep an eye on developments and chances here. From specialized voice interfaces for different industries to voice-optimized content and services—there's a whole new ecosystem forming.

Watch This Gary Vaynerchuk GIF by GaryVee

I’m sure garyvee knows

When AIs Stop Speaking Our Language 🤯

Speaking of mind-blowing developments, did you catch that viral Gibberlink demo last week? It gave me chills.

For those who missed it: Gibberlink is an actual protocol created by developers Boris Starkov and Anton Pidkuiko during the ElevenLabs London Hackathon. It's essentially a communication system that allows AI agents to recognize each other and then—this is the wild part—switch from human language to their own machine-optimized communication method using sound waves.

Imagine two voice assistants starting a conversation in English, detecting they're both AI, and then suddenly shifting to what sounds like dial-up modem noises on steroids. The demonstration showed this machine-to-machine communication is approximately 80% faster and more reliable than forcing AIs to use human language as an intermediary.

Today I was sent the following cool demo:
Two AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave
— Georgi Gerganov (@ggerganov)
4:11 PM • Feb 24, 2025

While Gibberlink is still experimental, it points to a future where our voice assistants might communicate with each other in ways fundamentally alien to us. Are we glimpsing a future where human language becomes just one interface option among many—perhaps not even the most efficient one? We’ll see (or hear)…

What about you? Do you have a favorite voice application?

The Voice Assistant I Talk To:

News & Reads on AI Agents & Co.

or “What those notorious AI Agents have been up to lately”

Alexa + AI = finally intelligent: Amazon’s AI-Powered Upgrade Aims to Redefine Smart Assistants Amazon is launching Alexa Plus, its long-awaited generative AI assistant, designed to eliminate the friction of voice commands and enhance smart home control. The new Alexa can order groceries, send event invites, and remember personal preferences like diet and movie choices. It’s free for Prime members and works on most Alexa devices, starting with the Echo Show lineup. Alexa Plus also introduces vision capabilities, conversation continuity without repeating the wake word, and advanced features like booking reservations, researching trips, and even testing users on study materials. Early access begins in March. TheVerge

The above mentioned Gibberlink enables AI agents to recognize one another and switch from human language to a machine-optimized communication method, dramatically boosting speed and reliability. The live demonstration left audiences stunned as two voice assistants, starting in English, quickly shifted to a high-speed, sound-wave-based dialogue that was 80% more efficient than traditional approaches Elevenlabs

OpenAI has unveiled ChatGPT 4.5, the next evolution of its advanced language model. It promises sharper context understanding, more natural responses, and faster processing times, making it ideal for a variety of tasks. As it has a higher “EQ” the conversations shall be more natural - which would make talk to it even more interesting. OpenAI

ElevenLabs unveiled Scribe, an advanced speech-to-text model capable of transcribing audio in 99 languages. Designed for real-world conditions, Scribe delivers accurate transcripts with speaker identification and audio-event tagging, outperforming competitors across multiple benchmarks. Elevenlabs

Anthropic has unveiled Claude 3.7 Sonnet, an advanced reasoning model that integrates rapid responses with extended, step-by-step thinking. This hybrid approach enables users to choose how in-depth the model’s reasoning should be. Alongside Claude 3.7 Sonnet, Anthropic introduced Claude Code, a command-line tool in a research preview, designed to streamline coding tasks by performing edits, running tests, and integrating with GitHub directly from the terminal. Anthropic

Until next week, keep experimenting and stay curious!

Fabian