222 lines
5.0 KiB
Markdown
222 lines
5.0 KiB
Markdown
# Bark TTS Integration Guide
|
|
|
|
## Overview
|
|
|
|
OpenCCB now integrates with **Suno AI's Bark** text-to-speech system for generating audio versions of questions. This allows students to listen to questions instead of just reading them, improving accessibility and supporting different learning styles.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────┐ HTTP ┌─────────────────┐
|
|
│ OpenCCB CMS │ ────────────> │ Bark TTS API │
|
|
│ (PostgreSQL) │ <──────────── │ (Server t-800)│
|
|
│ │ Audio │ │
|
|
└─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
## Deployment to t-800 Server
|
|
|
|
### Prerequisites
|
|
|
|
- SSH access to t-800 server
|
|
- At least 8GB RAM recommended (Bark loads large models)
|
|
- 10GB free disk space
|
|
- Python 3.8+
|
|
- GPU optional (CUDA support for faster generation)
|
|
|
|
### Quick Deploy
|
|
|
|
```bash
|
|
# From your local machine
|
|
cd /home/juan/dev/openccb
|
|
./scripts/deploy_to_t800.sh
|
|
```
|
|
|
|
This will:
|
|
1. SSH into t-800
|
|
2. Install Python dependencies
|
|
3. Clone Bark repository
|
|
4. Set up systemd service
|
|
5. Start the API server
|
|
|
|
### Manual Deploy
|
|
|
|
```bash
|
|
# SSH into t-800
|
|
ssh juan@t-800
|
|
|
|
# Run installation script
|
|
wget https://raw.githubusercontent.com/suno-ai/bark/main/scripts/install.sh
|
|
sudo bash install.sh
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
Once deployed, Bark API is available at `http://t-800:8000`
|
|
|
|
### Health Check
|
|
```bash
|
|
curl http://t-800:8000/health
|
|
```
|
|
|
|
### List Available Voices
|
|
```bash
|
|
curl http://t-800:8000/api/voices
|
|
```
|
|
|
|
### Generate Speech
|
|
```bash
|
|
# Basic usage
|
|
curl "http://t-800:8000/api/generate?text=What%20color%20is%20the%20sky%3F" \
|
|
-o question.wav
|
|
|
|
# With specific voice and speed
|
|
curl "http://t-800:8000/api/generate?text=Hello%20World&voice=v2/en_speaker_6&speed=1.2" \
|
|
-o greeting.wav
|
|
|
|
# Spanish voice
|
|
curl "http://t-800:8000/api/generate?text=Hola%20mundo&voice=v2/es_speaker_0" \
|
|
-o saludo.wav
|
|
```
|
|
|
|
## Available Voices
|
|
|
|
### English Voices
|
|
- `v2/en_speaker_0` through `v2/en_speaker_9`
|
|
|
|
### Spanish Voices
|
|
- `v2/es_speaker_0` through `v2/es_speaker_9`
|
|
|
|
## Integration with OpenCCB
|
|
|
|
### Generate Audio for a Question
|
|
|
|
```bash
|
|
# Via API
|
|
curl -X POST "http://localhost:3001/question-bank/{question_id}/generate-audio" \
|
|
-H "Authorization: Bearer YOUR_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"text": "What color is the sky?",
|
|
"voice": "v2/en_speaker_1",
|
|
"speed": 1.0
|
|
}'
|
|
```
|
|
|
|
### Automatic Audio Generation
|
|
|
|
When creating a question:
|
|
|
|
```json
|
|
POST /question-bank
|
|
{
|
|
"question_text": "What is the capital of France?",
|
|
"question_type": "multiple-choice",
|
|
"options": ["Paris", "London", "Berlin", "Madrid"],
|
|
"correct_answer": 0,
|
|
"explanation": "Paris is the capital of France.",
|
|
"generate_audio": true // Triggers async audio generation
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Add to your `.env` file:
|
|
|
|
```bash
|
|
# Bark TTS API URL
|
|
BARK_API_URL=http://t-800:8000
|
|
|
|
# Optional: Default voice for audio generation
|
|
BARK_DEFAULT_VOICE=v2/en_speaker_1
|
|
|
|
# Optional: Default speed
|
|
BARK_DEFAULT_SPEED=1.0
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
### Model Preloading
|
|
|
|
Bark preloads models on startup (takes ~30 seconds). The systemd service handles this automatically.
|
|
|
|
### Memory Management
|
|
|
|
The systemd service includes memory limits:
|
|
```ini
|
|
MemoryMax=4G
|
|
MemoryHigh=3G
|
|
```
|
|
|
|
Adjust based on your server's capacity.
|
|
|
|
### Batch Generation
|
|
|
|
For importing many questions:
|
|
|
|
```bash
|
|
# Generate audio for multiple questions
|
|
curl "http://t-800:8000/api/generate/batch?texts=Question%201&texts=Question%202&voice=v2/en_speaker_1"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Service Not Starting
|
|
|
|
```bash
|
|
# Check status
|
|
sudo systemctl status bark-tts
|
|
|
|
# View logs
|
|
sudo journalctl -u bark-tts -f
|
|
|
|
# Restart service
|
|
sudo systemctl restart bark-tts
|
|
```
|
|
|
|
### Out of Memory
|
|
|
|
If Bark crashes due to memory:
|
|
1. Reduce `MemoryMax` in systemd service
|
|
2. Use smaller models: `suno/bark-small`
|
|
3. Process questions one at a time
|
|
|
|
### Slow Generation
|
|
|
|
- GPU acceleration: Install CUDA-enabled PyTorch
|
|
- Reduce audio quality settings
|
|
- Use shorter text segments
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Test English voice
|
|
curl "http://t-800:8000/api/generate?text=The%20quick%20brown%20fox&voice=v2/en_speaker_1" | play -
|
|
|
|
# Test Spanish voice
|
|
curl "http://t-800:8000/api/generate?text=El%20rápido%20zorro%20marrón&voice=v2/es_speaker_0" | play -
|
|
```
|
|
|
|
## Security Notes
|
|
|
|
- Bark API runs on internal network only
|
|
- No authentication required (assumes trusted network)
|
|
- Rate limiting handled by OpenCCB
|
|
- Audio files stored in `uploads/audio/` directory
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Add authentication to Bark API
|
|
- [ ] Support for custom voice cloning
|
|
- [ ] Audio preprocessing (noise reduction, normalization)
|
|
- [ ] Caching layer for repeated requests
|
|
- [ ] WebSocket support for streaming audio
|
|
|
|
## References
|
|
|
|
- [Bark GitHub](https://github.com/suno-ai/bark)
|
|
- [Bark Hugging Face](https://huggingface.co/suno/bark)
|
|
- [OpenCCB Question Bank Documentation](../docs/question-bank.md)
|