Files
openccb/docs/BARK_TTS_GUIDE.md
T
2026-03-17 12:07:56 -03:00

5.0 KiB

Bark TTS Integration Guide

Overview

OpenCCB now integrates with Suno AI's Bark text-to-speech system for generating audio versions of questions. This allows students to listen to questions instead of just reading them, improving accessibility and supporting different learning styles.

Architecture

┌─────────────────┐     HTTP      ┌─────────────────┐
│   OpenCCB CMS   │ ────────────> │   Bark TTS API  │
│  (PostgreSQL)   │ <──────────── │   (Server t-800)│
│                 │    Audio      │                 │
└─────────────────┘               └─────────────────┘

Deployment to t-800 Server

Prerequisites

  • SSH access to t-800 server
  • At least 8GB RAM recommended (Bark loads large models)
  • 10GB free disk space
  • Python 3.8+
  • GPU optional (CUDA support for faster generation)

Quick Deploy

# From your local machine
cd /home/juan/dev/openccb
./scripts/deploy_to_t800.sh

This will:

  1. SSH into t-800
  2. Install Python dependencies
  3. Clone Bark repository
  4. Set up systemd service
  5. Start the API server

Manual Deploy

# SSH into t-800
ssh juan@t-800

# Run installation script
wget https://raw.githubusercontent.com/suno-ai/bark/main/scripts/install.sh
sudo bash install.sh

API Endpoints

Once deployed, Bark API is available at http://t-800:8000

Health Check

curl http://t-800:8000/health

List Available Voices

curl http://t-800:8000/api/voices

Generate Speech

# Basic usage
curl "http://t-800:8000/api/generate?text=What%20color%20is%20the%20sky%3F" \
  -o question.wav

# With specific voice and speed
curl "http://t-800:8000/api/generate?text=Hello%20World&voice=v2/en_speaker_6&speed=1.2" \
  -o greeting.wav

# Spanish voice
curl "http://t-800:8000/api/generate?text=Hola%20mundo&voice=v2/es_speaker_0" \
  -o saludo.wav

Available Voices

English Voices

  • v2/en_speaker_0 through v2/en_speaker_9

Spanish Voices

  • v2/es_speaker_0 through v2/es_speaker_9

Integration with OpenCCB

Generate Audio for a Question

# Via API
curl -X POST "http://localhost:3001/question-bank/{question_id}/generate-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "What color is the sky?",
    "voice": "v2/en_speaker_1",
    "speed": 1.0
  }'

Automatic Audio Generation

When creating a question:

POST /question-bank
{
  "question_text": "What is the capital of France?",
  "question_type": "multiple-choice",
  "options": ["Paris", "London", "Berlin", "Madrid"],
  "correct_answer": 0,
  "explanation": "Paris is the capital of France.",
  "generate_audio": true  // Triggers async audio generation
}

Configuration

Environment Variables

Add to your .env file:

# Bark TTS API URL
BARK_API_URL=http://t-800:8000

# Optional: Default voice for audio generation
BARK_DEFAULT_VOICE=v2/en_speaker_1

# Optional: Default speed
BARK_DEFAULT_SPEED=1.0

Performance Optimization

Model Preloading

Bark preloads models on startup (takes ~30 seconds). The systemd service handles this automatically.

Memory Management

The systemd service includes memory limits:

MemoryMax=4G
MemoryHigh=3G

Adjust based on your server's capacity.

Batch Generation

For importing many questions:

# Generate audio for multiple questions
curl "http://t-800:8000/api/generate/batch?texts=Question%201&texts=Question%202&voice=v2/en_speaker_1"

Troubleshooting

Service Not Starting

# Check status
sudo systemctl status bark-tts

# View logs
sudo journalctl -u bark-tts -f

# Restart service
sudo systemctl restart bark-tts

Out of Memory

If Bark crashes due to memory:

  1. Reduce MemoryMax in systemd service
  2. Use smaller models: suno/bark-small
  3. Process questions one at a time

Slow Generation

  • GPU acceleration: Install CUDA-enabled PyTorch
  • Reduce audio quality settings
  • Use shorter text segments

Testing

# Test English voice
curl "http://t-800:8000/api/generate?text=The%20quick%20brown%20fox&voice=v2/en_speaker_1" | play -

# Test Spanish voice
curl "http://t-800:8000/api/generate?text=El%20rápido%20zorro%20marrón&voice=v2/es_speaker_0" | play -

Security Notes

  • Bark API runs on internal network only
  • No authentication required (assumes trusted network)
  • Rate limiting handled by OpenCCB
  • Audio files stored in uploads/audio/ directory

Future Enhancements

  • Add authentication to Bark API
  • Support for custom voice cloning
  • Audio preprocessing (noise reduction, normalization)
  • Caching layer for repeated requests
  • WebSocket support for streaming audio

References