← Back to Insights
TECHNICAL

How Voice AI Connects to Your ERP in Real Time

Chris VanIttersum
Chris VanIttersum
February 2026 | 7 min read
Warehouse worker using a tablet near inventory shelves

Voice AI is only as useful as the data it can access. A voice assistant that reads from a stale cache is a liability. One that queries your ERP in real time—pulling live inventory counts, customer-specific pricing, and current order status—is a tool that actually works. The difference is integration architecture, and getting it right is the hardest part of any enterprise voice deployment.

The Latency Constraint

Conversational AI research consistently identifies 2 seconds as the maximum acceptable response time before a voice interaction feels broken. Users expect responses in a human-like rhythm—sub-100ms voice synthesis is now achievable, according to AgentVoice's 2025 market analysis—but the backend query has to complete within that same narrow window.

That 2-second budget gets divided across the full stack:

  • Speech-to-text: 200–500ms for converting audio to text, depending on utterance length and model
  • Intent classification and query construction: 50–150ms
  • Backend system query: 100–500ms, depending on the system and query complexity
  • Response generation: 100–200ms for natural language output
  • Text-to-speech: 100–300ms for voice synthesis

The math is tight. Every millisecond of unnecessary latency in the integration layer—authentication overhead, middleware hops, inefficient queries—directly degrades the user experience. Synthflow reports achieving sub-500ms end-to-end latency in production deployments, but that requires deliberate architectural choices at every layer.

Real-Time vs. Cached: A Practical Example

A field rep asks: "Is item 4521 in stock?"

In a cached architecture—one that syncs inventory data hourly—the system might report 24 units available based on data from 45 minutes ago. But 20 units shipped in the interim. Actual availability: 4. The rep promises availability to a customer. The order creates a backorder. Service failure.

In a real-time architecture, the system queries the ERP directly and reports 4 units. The rep has accurate information and can have the right conversation—including offering alternatives or setting delivery expectations.

The engineering required to make this real-time is significant. It demands:

  • A query architecture that doesn't add perceptible latency to the conversation
  • Connection resilience for network hiccups and system slowdowns
  • Authentication that works for voice sessions without manual login flows
  • A selective caching strategy—product catalogs and customer profiles can be cached; inventory levels and pricing cannot

Successful voice AI deployment hinges on seamless integration with an organization's existing IT landscape. This often catalyzes broader digital transformation by forcing the creation of a modern, unified API layer for legacy systems.

— Geodesic Capital, Voice AI: The Enterprise Primer, November 2025

Bi-Directional Integration

Most voice AI implementations are read-only. They can pull data but not push changes back. This limits utility to lookups—useful, but not transformational.

True bi-directional integration means the voice agent can both query and update:

Read operations: inventory checks, order status, pricing lookups, customer history, account balances, delivery tracking.

Write operations: order creation, CRM note updates, delivery schedule changes, task creation, credit requests, activity logging.

The write side is harder to implement safely. It requires:

  • Confirmation flows. "I'm about to place an order for 50 units of item 7845 at $2,340. Should I proceed?" Voice-initiated transactions need explicit confirmation before execution.
  • Rollback capability. If something fails mid-transaction—network drops, system timeout, conflicting update—the system needs to fail gracefully without leaving data in an inconsistent state.
  • Audit trails. Every voice-initiated change must be logged with timestamp, user identity, session context, and the specific action taken. This is a compliance requirement in most industries.
  • Permission inheritance. The voice agent should respect the same access controls as any other interface. A customer service rep can't access data through voice that they couldn't access through the web application.

Connecting to Legacy ERPs

This is where theory meets reality. Most mid-market distributors aren't running API-first cloud ERPs. Panorama Consulting's ERP research consistently shows the average mid-market system is over a decade old. Many are running platforms designed before voice AI was conceivable.

Integration patterns differ by system generation:

Modern cloud ERPs (NetSuite, Acumatica, Oracle Cloud) expose REST APIs with standard authentication. Integration is straightforward—authenticate, query, parse responses. The main constraints are API rate limits and the occasional gap in endpoint coverage.

Legacy systems with retrofitted APIs (older Epicor, Infor, some SAP configurations) have API layers added after the fact. Coverage is uneven—common operations like order creation and inventory lookup are usually exposed, but edge cases may require workarounds.

Legacy systems without APIs (proprietary systems, green-screen terminals) require middleware. Options include database-level integration (direct queries against the underlying database—fast but fragile), file-based integration (batch exports and imports—reliable but slow), or screen-scraping wrappers that simulate user interactions. None of these are elegant. All of them work in production when designed carefully.

The middleware landscape has improved significantly. Platforms like Workato, Boomi, and MuleSoft offer pre-built connectors for dozens of ERP systems. What used to be a custom six-month integration project can often be accomplished in weeks using these platforms. The tradeoff is an additional latency hop—typically 50–150ms—which needs to be factored into the 2-second response budget.

Free Assessment

Should You Replace Your ERP — or Enhance It?

50-75% of ERP replacements exceed budget. Get a data-driven recommendation in 5 minutes — no sales pitch.

Take the ERP Assessment

Data Consistency and Race Conditions

When voice AI both reads and writes, the standard distributed systems challenges apply. The most common in distribution: inventory race conditions.

If the voice agent reports "4 units in stock" and a warehouse worker simultaneously picks 3 for another order, the next voice-initiated order for 4 units creates a problem. Good architecture handles this through:

  • Optimistic locking: Verify the data hasn't changed between read and write
  • Conservative availability: Report available-to-promise quantities that account for in-process transactions, not raw inventory counts
  • Conflict detection: When two updates collide, flag for human resolution rather than silently overwriting
  • Transaction atomicity: Voice-initiated changes complete entirely or roll back entirely—no partial states

These are solved problems in database engineering. The challenge is applying them correctly in a voice context where the user expects a fast, natural response—not a "please wait while I verify" pause.

Security Considerations

Voice-to-ERP integration opens attack surface that doesn't exist with traditional interfaces. Key concerns:

Authentication. How does the voice system verify the caller's identity? Options include caller ID matching (convenient but spoofable), voice biometrics (increasingly reliable), PIN verification (secure but adds friction), and multi-factor approaches that combine methods.

Authorization. Voice sessions should inherit user permissions from the identity system. A rep who can't view certain customer data in the application shouldn't be able to access it through voice.

Data exposure. Voice responses can be overheard. Sensitive information—customer credit balances, payment details, competitive pricing—may need additional confirmation before being spoken aloud. "I have your account balance. Can you confirm the last four digits of your account number before I share that?"

Audit completeness. Every query and update initiated through voice must be traceable to a specific session, user identity, and timestamp. This isn't optional—it's a compliance requirement in most B2B contexts.

What to Look For

When evaluating voice AI platforms for ERP integration, the technical checklist includes:

  • Native connectors for your specific ERP, or documented middleware patterns
  • Measured end-to-end latency numbers (not theoretical—production measurements)
  • Bi-directional capability with confirmation flows and rollback
  • Real-time query support, not just cached data access
  • Graceful degradation when backend systems are slow or unavailable
  • Full audit logging with voice session context
  • Permission model that mirrors existing access controls

The goal is simple: when someone talks to the voice agent, they're interacting with their actual business systems in real time. Not a simulation. Not a cache. The real thing, live. That's what makes voice AI a business tool rather than a demo.

Free Guide

New to Voice AI? Start Here

Our getting-started guide covers the basics without the jargon.

Read the Guide

Stay Ahead of the Curve

Get weekly insights on AI, distribution, and supply chain delivered to your inbox.