From Voice to Value, How Enterprises Turn Conversational AI into Outcomes

In this article

Discussion with Yan Zhang

Yan Zhang brings a unique blend of operating and investing experience in artificial intelligence. As Chief Operating Officer at London-based PolyAI, he played a central role in scaling enterprise voice assistants across global markets. Earlier, he founded three consumer-technology ventures, achieving two successful exits. He now serves as an adviser to a U.S. venture fund focused on applied and enterprise AI in Europe. Having operated on both sides of the table, building companies and evaluating investments, he offers a rare dual perspective. In this conversation, he discusses the evolution of voice AI, the persistent bottlenecks in enterprise back-end systems, how regulated sectors can balance efficiency with trust, what it takes for startups to secure large enterprise customers, and why aligning incentives often determines whether pilots develop into scalable programs.

The Evolution of Voice AI, From Recognition to Agentic Resolution

The first wave of “voice AI” in customer service was narrow command-and-control, think single-word routing (“say billing”). It was useful but brittle, incapable of handling the ambiguity and nuance that define real customer intent. The second wave expanded into intent capture over full sentences, a leap forward that still struggled with long, multi-clause queries and edge-case phrasing. The real inflection point arrived with transformer architectures in the late 2010s, allowing systems to map language into meaning rather than chase keywords. As Zhang explains, this moved teams “from listening for words to classifying a sentence into a semantic vector,” enabling assistants to understand what customers meant, not just what they said.

“Modern voice AI doesn’t just recognize what you say—it understands what you mean and generates context-specific responses that can guide you through a problem.” Generative AI pushed the third wave, assistants that not only understand varied requests but can respond flexibly within business workflows, even when customers veer off the “happy path.” A hotel-booking bot no longer needs every possible Q&A pre-authored, it can consult room plans and answer a niche question (“Does the second bathroom have a shower or just a bathtub?”) without an engineer anticipating that branch. The result is higher first-contact resolution, fewer forced escalations, and the ability to serve complex intents with conversational fluidity.

The next threshold is agentic completion, connecting assistants to enterprise systems to perform secure read/write actions, change an address, process a refund, transfer funds, so customers leave the conversation with the task done, not just “ticketed.” As Zhang notes: “These agents will soon be able to plug directly into enterprise backends and actually carry out the changes customers request.”

The Backend Bottleneck, Why Automation Lags Conversation

Conversation quality is no longer the main constraint. Modern models handle accents, hesitations, and noisy audio remarkably well. The limiting factor is enterprise system readiness, the ability to integrate safely with heterogeneous, often decades-old stacks. Digital natives built in the cloud can expose clean, well-documented APIs, many incumbents run mission-critical processes on fragmented stacks inherited through M&A, across multiple systems of record and databases. “The biggest obstacle isn’t the AI itself, it’s that many enterprise backends simply aren’t ready for automation.”

When backends are brittle or undocumented, even the smartest agent becomes a glorified help-desk intake. The practical path forward is two-speed transformation, (1) prioritize a handful of high-volume, high-ROI customer journeys and invest in the middleware, identity, and policy controls needed for safe read/write access, while (2) pursuing longer-horizon cloud migration and data unification. This preserves momentum without waiting for a multi-year re-platform to finish.

As Zhang aptly puts it, “If AI doesn’t work within your organization, it’s most likely an incentive problem before it’s a technology problem.” That underscores how infrastructure is not just about code, it must reflect organizational readiness. The architecture should assume heterogeneity, using event buses to decouple assistants from systems of record, clear access rules to keep systems safe, and monitoring tools so every agent action can be reviewed by a human.

Building for Regulated Sectors, Safety and Human Touch

Financial services adopted voice AI early, often pairing it with voice biometrics to streamline verification. But progress in generative audio has created a paradox, the very channel that improved CX now introduces a new attack vector if controls lag behind model capabilities. “Voice biometrics made banking more convenient, but with deepfakes advancing so quickly, they’ve also become a potential vulnerability.”

Designing for regulated industries means leading with risk-segmented experiences. Use passive verification and automation for low-risk intents (balance inquiries, appointment scheduling), while gating high-risk actions (fund transfers, card closures) behind multi-factor verification, device reputation, step-up checks, and human escalation.

Treat the assistant as a selective load-balancer for humanity, the point is not “bots instead of people,” but bots for the straightforward, so people can serve the complex. “The real value isn’t boasting that you have the ‘best bot.’ It’s using AI to handle routine cases, so human agents can focus where they’re needed most.”

Finally, remember that early adopters carry tech debt. Controls designed for a 2017 voice stack may not withstand a 2025 threat landscape. Periodically re-baseline verification, content filters, and fraud models against current adversarial capabilities, not yesterday’s benchmark.

Selling Outcomes, Not Just Technology

AI frequently automates most of a workflow, but not all of it. In a 15-step claims process, a model might excel at 12 steps while three require information it cannot access, a human judgment call, or a policy-sensitive decision. If the vendor sells a “tool” that forces the buyer to reorganize staff, reassign work, or hire an overseer, deals stall, because the internal buyer rarely controls headcount or operating model changes. “Sometimes the smartest move is to package your solution as a tech-enabled service, absorbing the complexity, so your client doesn’t have to.”

A stronger motion is selling the outcome, deliver the end-to-end service with humans in the loop where needed, and price it against the value unlocked. This “tech-enabled service” approach absorbs organizational friction that would otherwise kill the deal. It also unlocks credibility with the earliest reference customers, the logos that influence the next hundred enterprises. “Winning those first ten customers often depends on borrowed credibility; partners, advisors, accelerators, until you can prove you belong in the enterprise market.”

For founders, the decision isn’t dogma (pure SaaS vs. services). It’s sequencing. Use services to win early trust and data, productize repeatable components, and migrate margins to software as back-end readiness and customer change-management catch up.

From Pilots to Scale: The Incentive Dilemma

Why do so many enterprise AI initiatives get stuck in pilot purgatory? In 2H-2023 and 2024, boards and CEOs pushed “do something with AI,” spawning experiments that sounded important but lacked business-critical urgency. Even when the tech performs, organizational incentives can be misaligned, especially when adoption implies job redesigns, reskilling, or headcount changes the sponsor can’t authorize. “AI initiatives often fail not because the systems are flawed, but because the organization isn’t structured to support them.”

The remedy is to anchor pilots to P&L-visible outcomes (hard capacity relief, cycle-time cuts, SLA gains) and design adoption paths that avoid organization-breaking asks. If the solution requires the customer to “fire three people and hire two new ones,” it will die in the seams between departments. Either restructure the offer (outcomes, not tool) or target use cases where stakeholders win together (e.g., deflection paired with improved NPS and faster refunds). “If AI isn’t working in your company, check the incentives first. The tech will only get better, but incentives are often harder to change.”

In practice, choose one or two journeys where the economic case survives scrutiny, align incentives across Ops, Risk, and CX, and make the first deployment boring, reliable, measurable, and obviously worth expanding.

Looking Forward to an Era of Autonomous Agents

AI has always promised automation, the difference now is robustness. As automation becomes more resilient, it threatens not only manual workflows but also the stickiness of systems of record. If it becomes easier to move data and actions across platforms, opportunities emerge to build new foundational systems atop smarter orchestration layers. “The next step is composite agents, specialized bots stitched together into fully autonomous digital operating units.”

Zhang envisions composite agents, specialized agents for sales, underwriting, claims, customer care, stitched into coordinated, auditable “digital operating units.” The hard problems will be coordination economics (how agents account for, pay for, and bill value to one another), inter-agent protocols, and enterprise-grade controls that keep human accountability in the loop. Those who solve these primitives won’t just automate tasks, they’ll define the infrastructure of the agentic economy. “The real breakthrough will be figuring out how these agents coordinate; how they exchange value, pay each other, and still stay accountable.”

For executives, the strategic takeaway is simple, invest in the connective tissue;identity, data quality, policy enforcement, observability, because it is the substrate on which composite agents, and the next generation of systems of record, will run.

Key Takeaways for Executive Leaders

Before wrapping up the conversation, Zhang emphasized three essential lessons for executives navigating applied AI. These insights come directly from his reflections on why some initiatives stall while others succeed.

Start with incentives, not technology. If AI isn’t working in your organization, it’s usually because incentives are misaligned, not because the tech is inadequate. Technology will continue to improve, but misaligned incentives can block adoption indefinitely.
Focus on delivering results, not just tools. Avoid creating organizational headaches for your clients. Packaging AI as an outcome, rather than a tool that forces restructuring, helps bypass incentive barriers and accelerates adoption.
Build for the agentic future. Don’t just automate existing tasks or replicate SaaS models at lower cost. Look ahead to the agentic economy, where AI agents perform core economic functions. Foundational infrastructure, like agent payment and coordination systems, could become the bedrock of the next wave of enterprise software.

‍

Stay current with our latest insights

Let’s stay connected

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Community Hub

Article

Video

Breaking Down Barriers in Modernizing Federal HR with AI Skills and Integrated Systems

Community Hub

Article

Video

Driving SaaS Growth Through Product Usage Signals

Dialectica is an information services company that brings unique, real-time insights to the world’s top business professionals with the vision to shape better decision making worldwide.