# PGVector Embeddings Implementation Guide ## Overview OpenCCB now includes **semantic search capabilities** using PostgreSQL's `pgvector` extension and Ollama's embedding models. This enables: 1. **Semantic question search** - Find similar questions in the question bank 2. **Improved RAG for question generation** - Generate questions based on semantic similarity 3. **Enhanced AI tutor chat** - Better context retrieval from knowledge base ## Architecture ``` ┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐ │ User Query │────▶│ Ollama │────▶│ Embedding │ │ (text) │ │ (embeddings)│ │ Vector (384) │ └─────────────────┘ └──────────────┘ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐ │ Search Results │◀────│ PostgreSQL │◀────│ pgvector │ │ (similar items)│ │ + pgvector │ │ cosine search │ └─────────────────┘ └──────────────┘ └─────────────────┘ ``` ## Installation ### 1. Update Docker Compose Change the database image to include pgvector: ```yaml # docker-compose.yml services: db: image: pgvector/pgvector:pg16 # Was: postgres:16-alpine ``` ### 2. Pull Embedding Model ```bash docker pull ollama/ollama:latest docker exec -it ollama ollama pull nomic-embed-text ``` ### 3. Run Migrations ```bash # CMS migrations (question_bank embeddings) DATABASE_URL=postgresql://user:password@localhost:5433/openccb_cms \ sqlx migrate run --source services/cms-service/migrations # LMS migrations (knowledge_base embeddings) DATABASE_URL=postgresql://user:password@localhost:5433/openccb_lms \ sqlx migrate run --source services/lms-service/migrations ``` ### 4. Generate Embeddings After migration, generate embeddings for existing data: ```bash # Generate question embeddings curl -X POST http://localhost:3001/question-bank/embeddings/generate \ -H "Authorization: Bearer YOUR_TOKEN" # Generate knowledge base embeddings curl -X POST http://localhost:3002/knowledge-base/embeddings/generate \ -H "Authorization: Bearer YOUR_TOKEN" ``` ## API Endpoints ### CMS (Port 3001) | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/question-bank/embeddings/generate` | Generate embeddings for all questions without them | | POST | `/question-bank/{id}/embedding/regenerate` | Regenerate embedding for a specific question | | GET | `/question-bank/semantic-search?query=...` | Search questions by semantic similarity | | GET | `/question-bank/similar/{id}?threshold=0.85` | Find questions similar to a given question | ### LMS (Port 3002) | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/knowledge-base/embeddings/generate` | Generate embeddings for knowledge base entries | | POST | `/knowledge-base/{id}/embedding/regenerate` | Regenerate embedding for a specific entry | | GET | `/knowledge-base/semantic-search?query=...` | Search knowledge base semantically | ## Configuration ### Environment Variables ```bash # .env LOCAL_OLLAMA_URL=http://localhost:11434 EMBEDDING_MODEL=nomic-embed-text ``` ### Supported Embedding Models | Model | Dimensions | Speed | Quality | Recommended | |-------|------------|-------|---------|-------------| | `nomic-embed-text` | 768 | Fast | Good | ✅ Default | | `mxbai-embed-large` | 1024 | Medium | Better | For higher accuracy | | `all-minilm` | 384 | Very Fast | Good | For resource-constrained | Pull models with: ```bash ollama pull nomic-embed-text ollama pull mxbai-embed-large ollama pull all-minilm ``` ## Usage Examples ### 1. Semantic Question Search ```bash curl -G "http://localhost:3001/question-bank/semantic-search" \ -d "query=questions about past tense verbs" \ -d "limit=10" \ -d "threshold=0.6" \ -H "Authorization: Bearer YOUR_TOKEN" ``` Response: ```json [ { "id": "uuid-here", "question_text": "Choose the correct past tense of 'to go'", "question_type": "multiple-choice", "similarity": 0.87, "tags": ["grammar", "past-tense"], "difficulty": "medium", "points": 1 } ] ``` ### 2. Find Duplicate Questions ```bash curl -G "http://localhost:3001/question-bank/similar/{question-id}" \ -d "threshold=0.95" \ -H "Authorization: Bearer YOUR_TOKEN" ``` ### 3. RAG Question Generation (Enhanced) ```bash curl -X POST "http://localhost:3001/test-templates/generate-with-rag" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_TOKEN" \ -d '{ "topic": "present perfect tense", "num_questions": 5 }' ``` This now uses **semantic search** to find relevant questions from the bank, not just keyword matching. ## Performance Considerations ### Index Tuning The migrations create IVFFlat indexes optimized for >10k rows. For larger datasets: ```sql -- For 100k+ rows, increase lists parameter DROP INDEX IF EXISTS idx_question_embeddings; CREATE INDEX idx_question_embeddings ON question_bank USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000); -- Default: 100 ``` ### Embedding Generation Speed - ~50ms per embedding with Ollama (local) - Batch generation: 100 questions ≈ 5 seconds - Recommended: Generate embeddings in background during off-peak hours ### Query Performance | Operation | Without Index | With IVFFlat | |-----------|---------------|--------------| | Similarity search (10k rows) | ~500ms | ~20ms | | Similarity search (100k rows) | ~5s | ~50ms | ## Hybrid Search Strategy The implementation uses a **hybrid approach**: 1. **First**: Try semantic search with embeddings (most accurate) 2. **Fallback**: Full-text search with tsvector (if embeddings unavailable) This ensures the system works even if: - Ollama is temporarily unavailable - Embeddings haven't been generated yet - You want to minimize latency for simple queries ## Database Schema ### Question Bank (CMS) ```sql ALTER TABLE question_bank ADD COLUMN embedding vector(384), ADD COLUMN embedding_updated_at TIMESTAMPTZ; CREATE INDEX idx_question_embeddings ON question_bank USING ivfflat (embedding vector_cosine_ops); ``` ### Knowledge Base (LMS) ```sql ALTER TABLE knowledge_base ADD COLUMN embedding vector(384), ADD COLUMN embedding_updated_at TIMESTAMPTZ; CREATE INDEX idx_knowledge_base_embeddings ON knowledge_base USING ivfflat (embedding vector_cosine_ops); ``` ## Troubleshooting ### "extension 'vector' does not exist" Make sure you're using the pgvector Docker image: ```bash docker-compose pull db docker-compose down docker-compose up -d db ``` ### Slow semantic search 1. Check if index exists: ```sql SELECT indexname FROM pg_indexes WHERE tablename = 'question_bank'; ``` 2. Verify index is being used: ```sql EXPLAIN ANALYZE SELECT * FROM question_bank ORDER BY embedding <=> '[...]'::vector LIMIT 10; ``` ### Embeddings not generating 1. Check Ollama is running: ```bash curl http://localhost:11434/api/tags ``` 2. Verify model is available: ```bash ollama list | grep nomic-embed ``` 3. Check logs for errors: ```bash docker logs openccb-studio-1 | grep -i embedding ``` ## Future Enhancements Potential improvements: 1. **Multi-vector search** - Combine title, question, and explanation embeddings 2. **Cross-lingual embeddings** - Support Spanish/English/Portuguese semantic search 3. **Query rewriting** - Use LLM to improve search queries before embedding 4. **Caching** - Cache common query embeddings for faster response 5. **Analytics** - Track which questions are most similar/related ## References - [pgvector GitHub](https://github.com/pgvector/pgvector) - [Ollama Embeddings API](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings) - [Nomic Embed Text Model](https://ollama.com/library/nomic-embed-text)