8.2 KiB
8.2 KiB
PGVector Embeddings Implementation Guide
Overview
OpenCCB now includes semantic search capabilities using PostgreSQL's pgvector extension and Ollama's embedding models. This enables:
- Semantic question search - Find similar questions in the question bank
- Improved RAG for question generation - Generate questions based on semantic similarity
- Enhanced AI tutor chat - Better context retrieval from knowledge base
Architecture
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ User Query │────▶│ Ollama │────▶│ Embedding │
│ (text) │ │ (embeddings)│ │ Vector (384) │
└─────────────────┘ └──────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Search Results │◀────│ PostgreSQL │◀────│ pgvector │
│ (similar items)│ │ + pgvector │ │ cosine search │
└─────────────────┘ └──────────────┘ └─────────────────┘
Installation
1. Update Docker Compose
Change the database image to include pgvector:
# docker-compose.yml
services:
db:
image: pgvector/pgvector:pg16 # Was: postgres:16-alpine
2. Pull Embedding Model
docker pull ollama/ollama:latest
docker exec -it ollama ollama pull nomic-embed-text
3. Run Migrations
# CMS migrations (question_bank embeddings)
DATABASE_URL=postgresql://user:password@localhost:5433/openccb_cms \
sqlx migrate run --source services/cms-service/migrations
# LMS migrations (knowledge_base embeddings)
DATABASE_URL=postgresql://user:password@localhost:5433/openccb_lms \
sqlx migrate run --source services/lms-service/migrations
4. Generate Embeddings
After migration, generate embeddings for existing data:
# Generate question embeddings
curl -X POST http://localhost:3001/question-bank/embeddings/generate \
-H "Authorization: Bearer YOUR_TOKEN"
# Generate knowledge base embeddings
curl -X POST http://localhost:3002/knowledge-base/embeddings/generate \
-H "Authorization: Bearer YOUR_TOKEN"
API Endpoints
CMS (Port 3001)
| Method | Endpoint | Description |
|---|---|---|
| POST | /question-bank/embeddings/generate |
Generate embeddings for all questions without them |
| POST | /question-bank/{id}/embedding/regenerate |
Regenerate embedding for a specific question |
| GET | /question-bank/semantic-search?query=... |
Search questions by semantic similarity |
| GET | /question-bank/similar/{id}?threshold=0.85 |
Find questions similar to a given question |
LMS (Port 3002)
| Method | Endpoint | Description |
|---|---|---|
| POST | /knowledge-base/embeddings/generate |
Generate embeddings for knowledge base entries |
| POST | /knowledge-base/{id}/embedding/regenerate |
Regenerate embedding for a specific entry |
| GET | /knowledge-base/semantic-search?query=... |
Search knowledge base semantically |
Configuration
Environment Variables
# .env
LOCAL_OLLAMA_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
Supported Embedding Models
| Model | Dimensions | Speed | Quality | Recommended |
|---|---|---|---|---|
nomic-embed-text |
768 | Fast | Good | ✅ Default |
mxbai-embed-large |
1024 | Medium | Better | For higher accuracy |
all-minilm |
384 | Very Fast | Good | For resource-constrained |
Pull models with:
ollama pull nomic-embed-text
ollama pull mxbai-embed-large
ollama pull all-minilm
Usage Examples
1. Semantic Question Search
curl -G "http://localhost:3001/question-bank/semantic-search" \
-d "query=questions about past tense verbs" \
-d "limit=10" \
-d "threshold=0.6" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
[
{
"id": "uuid-here",
"question_text": "Choose the correct past tense of 'to go'",
"question_type": "multiple-choice",
"similarity": 0.87,
"tags": ["grammar", "past-tense"],
"difficulty": "medium",
"points": 1
}
]
2. Find Duplicate Questions
curl -G "http://localhost:3001/question-bank/similar/{question-id}" \
-d "threshold=0.95" \
-H "Authorization: Bearer YOUR_TOKEN"
3. RAG Question Generation (Enhanced)
curl -X POST "http://localhost:3001/test-templates/generate-with-rag" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"topic": "present perfect tense",
"num_questions": 5
}'
This now uses semantic search to find relevant questions from the bank, not just keyword matching.
Performance Considerations
Index Tuning
The migrations create IVFFlat indexes optimized for >10k rows. For larger datasets:
-- For 100k+ rows, increase lists parameter
DROP INDEX IF EXISTS idx_question_embeddings;
CREATE INDEX idx_question_embeddings
ON question_bank
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000); -- Default: 100
Embedding Generation Speed
- ~50ms per embedding with Ollama (local)
- Batch generation: 100 questions ≈ 5 seconds
- Recommended: Generate embeddings in background during off-peak hours
Query Performance
| Operation | Without Index | With IVFFlat |
|---|---|---|
| Similarity search (10k rows) | ~500ms | ~20ms |
| Similarity search (100k rows) | ~5s | ~50ms |
Hybrid Search Strategy
The implementation uses a hybrid approach:
- First: Try semantic search with embeddings (most accurate)
- Fallback: Full-text search with tsvector (if embeddings unavailable)
This ensures the system works even if:
- Ollama is temporarily unavailable
- Embeddings haven't been generated yet
- You want to minimize latency for simple queries
Database Schema
Question Bank (CMS)
ALTER TABLE question_bank
ADD COLUMN embedding vector(384),
ADD COLUMN embedding_updated_at TIMESTAMPTZ;
CREATE INDEX idx_question_embeddings
ON question_bank
USING ivfflat (embedding vector_cosine_ops);
Knowledge Base (LMS)
ALTER TABLE knowledge_base
ADD COLUMN embedding vector(384),
ADD COLUMN embedding_updated_at TIMESTAMPTZ;
CREATE INDEX idx_knowledge_base_embeddings
ON knowledge_base
USING ivfflat (embedding vector_cosine_ops);
Troubleshooting
"extension 'vector' does not exist"
Make sure you're using the pgvector Docker image:
docker-compose pull db
docker-compose down
docker-compose up -d db
Slow semantic search
- Check if index exists:
SELECT indexname FROM pg_indexes WHERE tablename = 'question_bank';
- Verify index is being used:
EXPLAIN ANALYZE SELECT * FROM question_bank
ORDER BY embedding <=> '[...]'::vector LIMIT 10;
Embeddings not generating
- Check Ollama is running:
curl http://localhost:11434/api/tags
- Verify model is available:
ollama list | grep nomic-embed
- Check logs for errors:
docker logs openccb-studio-1 | grep -i embedding
Future Enhancements
Potential improvements:
- Multi-vector search - Combine title, question, and explanation embeddings
- Cross-lingual embeddings - Support Spanish/English/Portuguese semantic search
- Query rewriting - Use LLM to improve search queries before embedding
- Caching - Cache common query embeddings for faster response
- Analytics - Track which questions are most similar/related