Inworld + LiveKit: Unlocking studio-quality voice AI for real-time experiences at scale

It's time to bring your most ambitious AI applications to life with emotionally intelligent, real-time voice AI from Inworld. Now, you can use Inworld's pre-built voices or clone your own from a few seconds of audio in Inworld's API and via LiveKit's Agents framework. Inworld's multi-lingual, expressive voices are state-of-the art quality with real-time latency, but at about ~5% the cost of alternatives. You can learn more about Inworld's text-to-speech (TTS) models here.

Why Inworld + LiveKit for voice AI

You can now access Inworld voices and text-to-speech models via LiveKit's Agents framework plugin. This makes it easier for developers to create previously unimaginable, real-time voice experiences such as multiplayer games, agentic NPCs, customer-facing avatars, live training simulations, and more at an accessible price.

Experience Inworld TTS in a voice-driven, tabletop RPG game built by the LiveKit team. You can access the GitHub code repository to build your own voice-first, multi-agent game experience.

Natural, conversational speech: Combine LiveKit's programmable audio pipeline with Inworld's state-of-the-art TTS and temperature controls for emotionally grounded, turn-based dialogue and instant feedback. Control voice switching and audio routing dynamically.
Accessible pricing: Studio-quality voices for just $5/M characters, which is 5% of the cost of TTS from leading labs. That way you can build engaging experiences that scale with your users.
Real-time latency: Generate and stream Inworld voices in under 200ms latency to first audio chunk via LiveKit's global edge infrastructure, which is ideal for real-time experiences with proven reliability in high-concurrency environments.
Zero-shot voice cloning: Leverage Inworld's voice cloning capabilities to bring characters, brands, user-generated content, and more to life with emotion and personality using just 5-15 seconds of audio.
Multilingual voices: Build agents in 11 of the most common languages for consumers, including English (with its various accents), Chinese, Korean, Dutch, French, Spanish, and more. You can also preserve accents for a specific voice when switching languages.
Designed for developers: Build consumer applications with LiveKit's SDKs for web, mobile, and Unity. Craft custom voice pipelines using third-party integrations, RAG, and function calling. LiveKit's Agent framework also comes with performance metrics and debugging tools.

Built for builders

Get started in just minutes:

Use LiveKit's Agent framework to stream audio via Inworld's TTS endpoint with the LLM and STT providers of your choice.
Configure voice parameters, temperature control, and even language switching using Inworld's API.
Deploy your agent to LiveKit's global infrastructure, allowing you to speak to your agent with real-time latency from anywhere in the world.

Ready to start building? Explore additional documentation to get started.

LiveKit Text-to-Speech Documentation Inworld Text-to-Speech Overview Inworld Text-to-Speech Documentation

Inworld x LiveKit collaboration

Whether you are building immersive games, voice-first apps or agentic tools, Inworld + LiveKit is designed to give you full-stack control with real-world performance.

On June 17, 2025, Inworld and LiveKit hosted a Realtime AI Meetup in San Francisco. Hundreds of voice AI developers, founders, and enthusiasts gathered to explore how text-to-speech and speech-to-text models are built, key considerations for development, and how to maximize their potential in AI agents.

Our TTS modeling framework allows us to advance voice AI's emotional and contextual understanding and easily add new functionality, while keeping costs affordable. This helps democratize access to building high-quality, real-time voice experiences.”

Jean Wang, Inworld Head of Product.

Latency has a direct impact on user experience, and developers must consider how to manage it effectively. Fast models help, but efficient data streaming, optimized network communication, and prompting can also be crucial. Metrics like 'first response latency' are key. Developers should consider this not only in the models they use, but also in how they implement their applications.”

Michael Solati, LiveKit Developer Advocate.

Inworld + LiveKit: Unlocking studio-quality voice AI for real-time experiences at scale

Inworld + LiveKit: unlocking studio-quality voice AI for real-time experiences at scale

Why Inworld + LiveKit for voice AI

Built for builders

Inworld x LiveKit collaboration