feat: token count implement

This commit is contained in:
2026-03-17 12:07:56 -03:00
parent 41279585f6
commit be699ad6ab
44 changed files with 9032 additions and 167 deletions
+221
View File
@@ -0,0 +1,221 @@
# Bark TTS Integration Guide
## Overview
OpenCCB now integrates with **Suno AI's Bark** text-to-speech system for generating audio versions of questions. This allows students to listen to questions instead of just reading them, improving accessibility and supporting different learning styles.
## Architecture
```
┌─────────────────┐ HTTP ┌─────────────────┐
│ OpenCCB CMS │ ────────────> │ Bark TTS API │
│ (PostgreSQL) │ <──────────── │ (Server t-800)│
│ │ Audio │ │
└─────────────────┘ └─────────────────┘
```
## Deployment to t-800 Server
### Prerequisites
- SSH access to t-800 server
- At least 8GB RAM recommended (Bark loads large models)
- 10GB free disk space
- Python 3.8+
- GPU optional (CUDA support for faster generation)
### Quick Deploy
```bash
# From your local machine
cd /home/juan/dev/openccb
./scripts/deploy_to_t800.sh
```
This will:
1. SSH into t-800
2. Install Python dependencies
3. Clone Bark repository
4. Set up systemd service
5. Start the API server
### Manual Deploy
```bash
# SSH into t-800
ssh juan@t-800
# Run installation script
wget https://raw.githubusercontent.com/suno-ai/bark/main/scripts/install.sh
sudo bash install.sh
```
## API Endpoints
Once deployed, Bark API is available at `http://t-800:8000`
### Health Check
```bash
curl http://t-800:8000/health
```
### List Available Voices
```bash
curl http://t-800:8000/api/voices
```
### Generate Speech
```bash
# Basic usage
curl "http://t-800:8000/api/generate?text=What%20color%20is%20the%20sky%3F" \
-o question.wav
# With specific voice and speed
curl "http://t-800:8000/api/generate?text=Hello%20World&voice=v2/en_speaker_6&speed=1.2" \
-o greeting.wav
# Spanish voice
curl "http://t-800:8000/api/generate?text=Hola%20mundo&voice=v2/es_speaker_0" \
-o saludo.wav
```
## Available Voices
### English Voices
- `v2/en_speaker_0` through `v2/en_speaker_9`
### Spanish Voices
- `v2/es_speaker_0` through `v2/es_speaker_9`
## Integration with OpenCCB
### Generate Audio for a Question
```bash
# Via API
curl -X POST "http://localhost:3001/question-bank/{question_id}/generate-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "What color is the sky?",
"voice": "v2/en_speaker_1",
"speed": 1.0
}'
```
### Automatic Audio Generation
When creating a question:
```json
POST /question-bank
{
"question_text": "What is the capital of France?",
"question_type": "multiple-choice",
"options": ["Paris", "London", "Berlin", "Madrid"],
"correct_answer": 0,
"explanation": "Paris is the capital of France.",
"generate_audio": true // Triggers async audio generation
}
```
## Configuration
### Environment Variables
Add to your `.env` file:
```bash
# Bark TTS API URL
BARK_API_URL=http://t-800:8000
# Optional: Default voice for audio generation
BARK_DEFAULT_VOICE=v2/en_speaker_1
# Optional: Default speed
BARK_DEFAULT_SPEED=1.0
```
## Performance Optimization
### Model Preloading
Bark preloads models on startup (takes ~30 seconds). The systemd service handles this automatically.
### Memory Management
The systemd service includes memory limits:
```ini
MemoryMax=4G
MemoryHigh=3G
```
Adjust based on your server's capacity.
### Batch Generation
For importing many questions:
```bash
# Generate audio for multiple questions
curl "http://t-800:8000/api/generate/batch?texts=Question%201&texts=Question%202&voice=v2/en_speaker_1"
```
## Troubleshooting
### Service Not Starting
```bash
# Check status
sudo systemctl status bark-tts
# View logs
sudo journalctl -u bark-tts -f
# Restart service
sudo systemctl restart bark-tts
```
### Out of Memory
If Bark crashes due to memory:
1. Reduce `MemoryMax` in systemd service
2. Use smaller models: `suno/bark-small`
3. Process questions one at a time
### Slow Generation
- GPU acceleration: Install CUDA-enabled PyTorch
- Reduce audio quality settings
- Use shorter text segments
## Testing
```bash
# Test English voice
curl "http://t-800:8000/api/generate?text=The%20quick%20brown%20fox&voice=v2/en_speaker_1" | play -
# Test Spanish voice
curl "http://t-800:8000/api/generate?text=El%20rápido%20zorro%20marrón&voice=v2/es_speaker_0" | play -
```
## Security Notes
- Bark API runs on internal network only
- No authentication required (assumes trusted network)
- Rate limiting handled by OpenCCB
- Audio files stored in `uploads/audio/` directory
## Future Enhancements
- [ ] Add authentication to Bark API
- [ ] Support for custom voice cloning
- [ ] Audio preprocessing (noise reduction, normalization)
- [ ] Caching layer for repeated requests
- [ ] WebSocket support for streaming audio
## References
- [Bark GitHub](https://github.com/suno-ai/bark)
- [Bark Hugging Face](https://huggingface.co/suno/bark)
- [OpenCCB Question Bank Documentation](../docs/question-bank.md)