Conversation Modes - Conversimple SDK

Overview

Conversimple supports two conversation modes, each optimized for different use cases:

STT Mode (Speech-to-Text → LLM → Text-to-Speech): Traditional pipeline with maximum flexibility
STS Mode (Speech-to-Speech): Unified pipeline for ultra-low latency

STT Mode: Maximum Flexibility

Architecture

User Speech
    ↓
Speech-to-Text Service (Gemini Live STT)
    ↓ Transcription
Large Language Model (Gemini 2.5 Pro)
    ↓ Generated Text Response
Text-to-Speech Service (Gemini TTS)
    ↓ Audio Output
User Speaker

When to Use STT Mode

Custom LLM Logic

Need to customize LLM behavior, prompts, or temperature settings

Multi-Provider

Want to use different providers for STT, LLM, and TTS

Processing Pipeline

Need to process or transform text between stages

Advanced Control

Require fine-grained control over each stage

Characteristics

Latency: Under 1 second typical response time Flexibility: Very High

Separate configuration for each service
Custom prompt engineering
Text transformation between stages
Provider mixing (e.g., Deepgram STT + OpenAI LLM + ElevenLabs TTS)

Use Cases:

Complex conversation logic
Custom LLM prompting strategies
Multi-language support with specific providers
Advanced text processing requirements

Example: STT Mode Configuration

from conversimple import ConversimpleAgent, tool

class CustomAgent(ConversimpleAgent):
    """Agent using STT mode for maximum flexibility"""

    def __init__(self, **kwargs):
        super().__init__(
            mode="stt",  # Explicit STT mode
            stt_provider="gemini_live",
            llm_provider="gemini_pro",
            llm_config={
                "temperature": 0.7,
                "system_instruction": "You are a helpful assistant..."
            },
            tts_provider="gemini_live",
            **kwargs
        )

    @tool("Get customer information")
    def get_customer(self, customer_id: str) -> dict:
        return {"name": "John", "tier": "premium"}

STS Mode: Ultra-Low Latency

Architecture

User Speech
    ↓
Gemini Live STS Service
    ↓ Complete Speech-to-Speech Processing
User Speaker

When to Use STS Mode

Ultra-Low Latency

Need the fastest possible response times

Natural Flow

Want the most natural conversation dynamics

Simplified Stack

Prefer fewer moving parts and dependencies

Gemini Optimized

Leverage Gemini’s native speech-to-speech capabilities

Characteristics

Latency: Ultra-low, typically under 600ms

Single unified service for fastest response
Approximately 2x faster than STT mode
Better interruption handling

Flexibility: Moderate

Single provider (currently Gemini Live)
Less control over individual stages
Function calling fully supported
Optimized for conversation flow

Use Cases:

Customer service chatbots
Real-time support agents
Interactive voice assistants
Natural conversation experiences

Example: STS Mode Configuration

from conversimple import ConversimpleAgent, tool

class FastAgent(ConversimpleAgent):
    """Agent using STS mode for minimal latency"""

    def __init__(self, **kwargs):
        super().__init__(
            mode="sts",  # Speech-to-Speech mode
            sts_provider="gemini_live",
            system_instruction="You are a helpful assistant...",
            **kwargs
        )

    @tool("Get customer information")
    def get_customer(self, customer_id: str) -> dict:
        return {"name": "John", "tier": "premium"}

Comparison

Feature	STT Mode	STS Mode
Latency	< 1 second	< 600ms (Ultra-low)
Providers	Mix & match	Single provider
Flexibility	Very High	Moderate
Setup Complexity	Higher	Lower
Function Calling	✅ Supported	✅ Supported
Interruptions	Good	Excellent
Custom Prompts	Full control	System instruction
Multi-language	Provider-specific	Gemini languages
Cost	Per-service pricing	Single service pricing

Function Calling Support

Both modes fully support function calling:

STT Mode Function Calling

User: "Book me a flight to NYC"
    ↓ STT
"Book me a flight to NYC"
    ↓ LLM (decides to call tool)
tool_call: book_flight(destination="NYC")
    ↓ Your Agent
{"booking_id": "ABC123", "price": 450}
    ↓ LLM (generates response)
"I've booked your flight to NYC for $450"
    ↓ TTS
Audio: "I've booked your flight..."

STS Mode Function Calling

User: "Book me a flight to NYC"
    ↓ Gemini Live STS
tool_call: book_flight(destination="NYC")
    ↓ Your Agent
{"booking_id": "ABC123", "price": 450}
    ↓ Gemini Live STS
Audio: "I've booked your flight..."

Function calling works identically in both modes - the only difference is the processing pipeline.

Choosing the Right Mode

Choose STT Mode If:

You need to use specific providers (e.g., OpenAI, Deepgram, ElevenLabs)
You require custom LLM configuration or prompt engineering
You need to process or transform text between stages
You want maximum control over each component
You need flexibility to mix and match AI services

Choose STS Mode If:

Minimal latency is critical for your use case
You want the simplest architecture
Natural conversation flow is a priority
You’re comfortable with Gemini Live as your provider
You prefer fewer dependencies to manage

Switching Between Modes

You can easily switch between modes by changing the configuration:

# Development: Use STS for fast iteration
dev_agent = MyAgent(mode="sts", sts_provider="gemini_live")

# Production: Switch to STT for custom LLM
prod_agent = MyAgent(
    mode="stt",
    stt_provider="deepgram",
    llm_provider="openai",
    tts_provider="elevenlabs"
)

Your tool definitions and business logic remain unchanged.

Best Practices

For STT Mode

Optimize LLM prompts for your use case
Consider provider costs and rate limits
Test latency across the full pipeline
Monitor each service independently

For STS Mode

Use for latency-critical applications
Leverage Gemini’s natural conversation capabilities
Test interruption handling thoroughly
Monitor overall conversation quality

Next Steps

Use Cases

Explore real-world applications

Authentication

Set up your API credentials

Platform Overview

Getting Started

Core Concepts

Advanced Guides

Integration Patterns

Examples

Troubleshooting

​Overview

​STT Mode: Maximum Flexibility

​Architecture

​When to Use STT Mode

Custom LLM Logic

Multi-Provider

Processing Pipeline

Advanced Control

​Characteristics

​Example: STT Mode Configuration

​STS Mode: Ultra-Low Latency

​Architecture

​When to Use STS Mode

Ultra-Low Latency

Natural Flow

Simplified Stack

Gemini Optimized

​Characteristics

​Example: STS Mode Configuration

​Comparison

​Function Calling Support

​STT Mode Function Calling

​STS Mode Function Calling

​Choosing the Right Mode

​Choose STT Mode If:

​Choose STS Mode If:

​Switching Between Modes

​Best Practices

​For STT Mode

​For STS Mode

​Next Steps

Use Cases

Authentication

Overview

STT Mode: Maximum Flexibility

Architecture

When to Use STT Mode

Characteristics

Example: STT Mode Configuration

STS Mode: Ultra-Low Latency

Architecture

When to Use STS Mode

Characteristics

Example: STS Mode Configuration

Comparison

Function Calling Support

STT Mode Function Calling

STS Mode Function Calling

Choosing the Right Mode

Choose STT Mode If:

Choose STS Mode If:

Switching Between Modes

Best Practices

For STT Mode

For STS Mode

Next Steps