5.0 KiB
5.0 KiB
Bark TTS Integration Guide
Overview
OpenCCB now integrates with Suno AI's Bark text-to-speech system for generating audio versions of questions. This allows students to listen to questions instead of just reading them, improving accessibility and supporting different learning styles.
Architecture
┌─────────────────┐ HTTP ┌─────────────────┐
│ OpenCCB CMS │ ────────────> │ Bark TTS API │
│ (PostgreSQL) │ <──────────── │ (Server t-800)│
│ │ Audio │ │
└─────────────────┘ └─────────────────┘
Deployment to t-800 Server
Prerequisites
- SSH access to t-800 server
- At least 8GB RAM recommended (Bark loads large models)
- 10GB free disk space
- Python 3.8+
- GPU optional (CUDA support for faster generation)
Quick Deploy
# From your local machine
cd /home/juan/dev/openccb
./scripts/deploy_to_t800.sh
This will:
- SSH into t-800
- Install Python dependencies
- Clone Bark repository
- Set up systemd service
- Start the API server
Manual Deploy
# SSH into t-800
ssh juan@t-800
# Run installation script
wget https://raw.githubusercontent.com/suno-ai/bark/main/scripts/install.sh
sudo bash install.sh
API Endpoints
Once deployed, Bark API is available at http://t-800:8000
Health Check
curl http://t-800:8000/health
List Available Voices
curl http://t-800:8000/api/voices
Generate Speech
# Basic usage
curl "http://t-800:8000/api/generate?text=What%20color%20is%20the%20sky%3F" \
-o question.wav
# With specific voice and speed
curl "http://t-800:8000/api/generate?text=Hello%20World&voice=v2/en_speaker_6&speed=1.2" \
-o greeting.wav
# Spanish voice
curl "http://t-800:8000/api/generate?text=Hola%20mundo&voice=v2/es_speaker_0" \
-o saludo.wav
Available Voices
English Voices
v2/en_speaker_0throughv2/en_speaker_9
Spanish Voices
v2/es_speaker_0throughv2/es_speaker_9
Integration with OpenCCB
Generate Audio for a Question
# Via API
curl -X POST "http://localhost:3001/question-bank/{question_id}/generate-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "What color is the sky?",
"voice": "v2/en_speaker_1",
"speed": 1.0
}'
Automatic Audio Generation
When creating a question:
POST /question-bank
{
"question_text": "What is the capital of France?",
"question_type": "multiple-choice",
"options": ["Paris", "London", "Berlin", "Madrid"],
"correct_answer": 0,
"explanation": "Paris is the capital of France.",
"generate_audio": true // Triggers async audio generation
}
Configuration
Environment Variables
Add to your .env file:
# Bark TTS API URL
BARK_API_URL=http://t-800:8000
# Optional: Default voice for audio generation
BARK_DEFAULT_VOICE=v2/en_speaker_1
# Optional: Default speed
BARK_DEFAULT_SPEED=1.0
Performance Optimization
Model Preloading
Bark preloads models on startup (takes ~30 seconds). The systemd service handles this automatically.
Memory Management
The systemd service includes memory limits:
MemoryMax=4G
MemoryHigh=3G
Adjust based on your server's capacity.
Batch Generation
For importing many questions:
# Generate audio for multiple questions
curl "http://t-800:8000/api/generate/batch?texts=Question%201&texts=Question%202&voice=v2/en_speaker_1"
Troubleshooting
Service Not Starting
# Check status
sudo systemctl status bark-tts
# View logs
sudo journalctl -u bark-tts -f
# Restart service
sudo systemctl restart bark-tts
Out of Memory
If Bark crashes due to memory:
- Reduce
MemoryMaxin systemd service - Use smaller models:
suno/bark-small - Process questions one at a time
Slow Generation
- GPU acceleration: Install CUDA-enabled PyTorch
- Reduce audio quality settings
- Use shorter text segments
Testing
# Test English voice
curl "http://t-800:8000/api/generate?text=The%20quick%20brown%20fox&voice=v2/en_speaker_1" | play -
# Test Spanish voice
curl "http://t-800:8000/api/generate?text=El%20rápido%20zorro%20marrón&voice=v2/es_speaker_0" | play -
Security Notes
- Bark API runs on internal network only
- No authentication required (assumes trusted network)
- Rate limiting handled by OpenCCB
- Audio files stored in
uploads/audio/directory
Future Enhancements
- Add authentication to Bark API
- Support for custom voice cloning
- Audio preprocessing (noise reduction, normalization)
- Caching layer for repeated requests
- WebSocket support for streaming audio