← Back to Labs
PyPI: chorus-ttsPythonTTSAudioAIOpen Source

Chorus

Turn written scripts into professional multi-voice audio — podcasts, voiceovers, and narrations, produced automatically.

live
Chorus cover
🔬Note

chorus-tts 1.0.0 — available now on PyPI. Built in Python, MIT licensed, runs on CPU or GPU.

The problem with creating audio content today

Artificial voices have gotten genuinely good. The speech sounds warm and natural, the pacing is convincing, and the latest technology can read a paragraph in a way that doesn't sound robotic at all.

But turning that technology into finished audio content? Still a painful process.

Want to create a conversation between two people? You're manually stitching audio clips together. Want to produce a podcast episode from a written script? You're now spending more time on audio editing software than on writing your content. Want to produce consistent voice content at scale across multiple episodes, speakers, or languages? You need a production team, a studio budget, or both.

We kept hitting the same ceiling on internal projects. The voice technology itself wasn't the problem — it was everything around it.

⚠️Problem

What was missing:

  • No easy way to create multi-speaker conversations
  • Voice management was treated as an afterthought
  • No structured way to go from a written script to a finished audio file
  • Every project required rebuilding the same audio production pipeline
  • Manual audio editing for every pause, transition, and speaker change

So we built the production layer we wished existed.


Why we built Chorus

Chorus started as an internal tool. We were generating voiced demos for a client product, and after the third time manually piecing together audio segments, we knew there had to be a better way.

Then we needed multiple voices in the same production. Then we needed consistent voice quality across episodes. Then we needed proper progress feedback because rendering audio can take a while and sitting in front of a silent screen wondering if something crashed is nobody's idea of a good time. Each addition was small, but together they turned into something genuinely useful — a focused production tool that turns raw scripts into polished audio.

The core idea: scripts in, audio out. You should be able to write a conversation as plain text and get a finished, professional-sounding audio file back. No audio editing, no manual stitching, no fighting with production software.

Under the hood, Chorus is powered by Chatterbox Turbo, one of the most natural-sounding voice synthesis engines available. We didn't want to rebuild the voice — we wanted to rebuild the experience around it.


What Chorus actually is

Chorus is a Python library with two main capabilities, designed to work together seamlessly:

1. A voice synthesis engine with built-in voice management

Voices are first-class citizens. Drop a short audio sample into the library and Chorus picks it up automatically — no configuration, no registration, no setup wizard. Listing available voices is one command. Generating audio for a given text and voice is another. It just works.

2. A podcast and dialogue generator for multi-speaker content

Define a conversation as a list of segments — who speaks and what they say. Chorus produces each segment with the correct voice, handles the transitions and pacing between speakers, and returns a single, polished audio file ready to publish or distribute. This is the part we use most — it's the difference between "a text-to-speech call" and "a finished production."

Here's what one of those generated conversations actually sounds like, produced from a short two-person script using bundled voice profiles:

♪ ListenSample podcast — two voices, one script
Generated end-to-end with chorus-tts. Two speakers, no manual editing.

That clip is the whole pitch in 30 seconds. You wrote a script; Chorus handed you back a finished audio file.


How it works

Install the library and you have a working audio production studio:

pip install chorus-tts
from chorus import Chorus

# Initialize — works on CPU or GPU
chorus = Chorus(device="cpu")

# See what voices are available
voices = chorus.tts.list_voices()
print("Available voices:", voices)

Single-voice synthesis

The simplest case — convert any text to natural-sounding audio using a named voice:

import torchaudio

wav = chorus.tts.convert("Hello, world!", voice="charlie")
torchaudio.save("output.wav", wav, chorus.tts.model.sr)

No session setup, no model configuration. The library handles everything behind the scenes.

Multi-speaker conversations and podcasts

This is where Chorus earns its name. Write your script as structured data:

from chorus import ConversationInput

data: ConversationInput = {
    "segments": [
        {"voice": "charlie", "text": "Hello Emilia! How was your weekend?"},
        {"voice": "emilia", "text": "Hi Charlie! It was great — went hiking up the coast."},
        {"voice": "charlie", "text": "Nice. Any photos worth showing?"},
        {"voice": "emilia", "text": "A few. I'll send them over after the call."},
    ]
}

# Generate the full conversation as a single audio file
podcast_wav = chorus.tts.podcast.create(data)

# Save it
import torchaudio
torchaudio.save("podcast.wav", podcast_wav, chorus.tts.model.sr)

One call produces the entire conversation — voice switching, transitions, pacing, and final output all handled automatically. A twelve-segment script becomes one clean audio file with zero manual editing.


What's next

Chorus is intentionally focused. It's a production tool, not a platform. That said, the roadmap has some useful additions:

  • Pause and pacing controls — fine-tune silence between segments for natural conversation flow
  • Emphasis and pronunciation markup — control how specific words or phrases are spoken
  • Streaming generation — produce audio as it's being generated rather than waiting for the entire file
  • Voice cloning — add a new voice from a short reference sample, no audio engineering required
  • Background music and intro/outro mixing — finished episodes with bumpers and music, not just raw speech

If you've ever needed to turn written content into multi-voice audio and found yourself spending more time in audio editing software than on your actual content, Chorus is for you.


📦Install

Get started in one line:

pip install chorus-tts

Then from chorus import Chorus and you're producing audio in three lines of code. The source is on GitHub — voice contributions and PRs welcome.