feat: implementing embedding AI
This commit is contained in:
@@ -0,0 +1,286 @@
|
||||
# PGVector Embeddings Implementation Guide
|
||||
|
||||
## Overview
|
||||
|
||||
OpenCCB now includes **semantic search capabilities** using PostgreSQL's `pgvector` extension and Ollama's embedding models. This enables:
|
||||
|
||||
1. **Semantic question search** - Find similar questions in the question bank
|
||||
2. **Improved RAG for question generation** - Generate questions based on semantic similarity
|
||||
3. **Enhanced AI tutor chat** - Better context retrieval from knowledge base
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
|
||||
│ User Query │────▶│ Ollama │────▶│ Embedding │
|
||||
│ (text) │ │ (embeddings)│ │ Vector (384) │
|
||||
└─────────────────┘ └──────────────┘ └────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
|
||||
│ Search Results │◀────│ PostgreSQL │◀────│ pgvector │
|
||||
│ (similar items)│ │ + pgvector │ │ cosine search │
|
||||
└─────────────────┘ └──────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Update Docker Compose
|
||||
|
||||
Change the database image to include pgvector:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
db:
|
||||
image: pgvector/pgvector:pg16 # Was: postgres:16-alpine
|
||||
```
|
||||
|
||||
### 2. Pull Embedding Model
|
||||
|
||||
```bash
|
||||
docker pull ollama/ollama:latest
|
||||
docker exec -it ollama ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
### 3. Run Migrations
|
||||
|
||||
```bash
|
||||
# CMS migrations (question_bank embeddings)
|
||||
DATABASE_URL=postgresql://user:password@localhost:5433/openccb_cms \
|
||||
sqlx migrate run --source services/cms-service/migrations
|
||||
|
||||
# LMS migrations (knowledge_base embeddings)
|
||||
DATABASE_URL=postgresql://user:password@localhost:5433/openccb_lms \
|
||||
sqlx migrate run --source services/lms-service/migrations
|
||||
```
|
||||
|
||||
### 4. Generate Embeddings
|
||||
|
||||
After migration, generate embeddings for existing data:
|
||||
|
||||
```bash
|
||||
# Generate question embeddings
|
||||
curl -X POST http://localhost:3001/question-bank/embeddings/generate \
|
||||
-H "Authorization: Bearer YOUR_TOKEN"
|
||||
|
||||
# Generate knowledge base embeddings
|
||||
curl -X POST http://localhost:3002/knowledge-base/embeddings/generate \
|
||||
-H "Authorization: Bearer YOUR_TOKEN"
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### CMS (Port 3001)
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/question-bank/embeddings/generate` | Generate embeddings for all questions without them |
|
||||
| POST | `/question-bank/{id}/embedding/regenerate` | Regenerate embedding for a specific question |
|
||||
| GET | `/question-bank/semantic-search?query=...` | Search questions by semantic similarity |
|
||||
| GET | `/question-bank/similar/{id}?threshold=0.85` | Find questions similar to a given question |
|
||||
|
||||
### LMS (Port 3002)
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/knowledge-base/embeddings/generate` | Generate embeddings for knowledge base entries |
|
||||
| POST | `/knowledge-base/{id}/embedding/regenerate` | Regenerate embedding for a specific entry |
|
||||
| GET | `/knowledge-base/semantic-search?query=...` | Search knowledge base semantically |
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# .env
|
||||
LOCAL_OLLAMA_URL=http://localhost:11434
|
||||
EMBEDDING_MODEL=nomic-embed-text
|
||||
```
|
||||
|
||||
### Supported Embedding Models
|
||||
|
||||
| Model | Dimensions | Speed | Quality | Recommended |
|
||||
|-------|------------|-------|---------|-------------|
|
||||
| `nomic-embed-text` | 768 | Fast | Good | ✅ Default |
|
||||
| `mxbai-embed-large` | 1024 | Medium | Better | For higher accuracy |
|
||||
| `all-minilm` | 384 | Very Fast | Good | For resource-constrained |
|
||||
|
||||
Pull models with:
|
||||
```bash
|
||||
ollama pull nomic-embed-text
|
||||
ollama pull mxbai-embed-large
|
||||
ollama pull all-minilm
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### 1. Semantic Question Search
|
||||
|
||||
```bash
|
||||
curl -G "http://localhost:3001/question-bank/semantic-search" \
|
||||
-d "query=questions about past tense verbs" \
|
||||
-d "limit=10" \
|
||||
-d "threshold=0.6" \
|
||||
-H "Authorization: Bearer YOUR_TOKEN"
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "uuid-here",
|
||||
"question_text": "Choose the correct past tense of 'to go'",
|
||||
"question_type": "multiple-choice",
|
||||
"similarity": 0.87,
|
||||
"tags": ["grammar", "past-tense"],
|
||||
"difficulty": "medium",
|
||||
"points": 1
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### 2. Find Duplicate Questions
|
||||
|
||||
```bash
|
||||
curl -G "http://localhost:3001/question-bank/similar/{question-id}" \
|
||||
-d "threshold=0.95" \
|
||||
-H "Authorization: Bearer YOUR_TOKEN"
|
||||
```
|
||||
|
||||
### 3. RAG Question Generation (Enhanced)
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:3001/test-templates/generate-with-rag" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer YOUR_TOKEN" \
|
||||
-d '{
|
||||
"topic": "present perfect tense",
|
||||
"num_questions": 5
|
||||
}'
|
||||
```
|
||||
|
||||
This now uses **semantic search** to find relevant questions from the bank, not just keyword matching.
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Index Tuning
|
||||
|
||||
The migrations create IVFFlat indexes optimized for >10k rows. For larger datasets:
|
||||
|
||||
```sql
|
||||
-- For 100k+ rows, increase lists parameter
|
||||
DROP INDEX IF EXISTS idx_question_embeddings;
|
||||
CREATE INDEX idx_question_embeddings
|
||||
ON question_bank
|
||||
USING ivfflat (embedding vector_cosine_ops)
|
||||
WITH (lists = 1000); -- Default: 100
|
||||
```
|
||||
|
||||
### Embedding Generation Speed
|
||||
|
||||
- ~50ms per embedding with Ollama (local)
|
||||
- Batch generation: 100 questions ≈ 5 seconds
|
||||
- Recommended: Generate embeddings in background during off-peak hours
|
||||
|
||||
### Query Performance
|
||||
|
||||
| Operation | Without Index | With IVFFlat |
|
||||
|-----------|---------------|--------------|
|
||||
| Similarity search (10k rows) | ~500ms | ~20ms |
|
||||
| Similarity search (100k rows) | ~5s | ~50ms |
|
||||
|
||||
## Hybrid Search Strategy
|
||||
|
||||
The implementation uses a **hybrid approach**:
|
||||
|
||||
1. **First**: Try semantic search with embeddings (most accurate)
|
||||
2. **Fallback**: Full-text search with tsvector (if embeddings unavailable)
|
||||
|
||||
This ensures the system works even if:
|
||||
- Ollama is temporarily unavailable
|
||||
- Embeddings haven't been generated yet
|
||||
- You want to minimize latency for simple queries
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Question Bank (CMS)
|
||||
|
||||
```sql
|
||||
ALTER TABLE question_bank
|
||||
ADD COLUMN embedding vector(384),
|
||||
ADD COLUMN embedding_updated_at TIMESTAMPTZ;
|
||||
|
||||
CREATE INDEX idx_question_embeddings
|
||||
ON question_bank
|
||||
USING ivfflat (embedding vector_cosine_ops);
|
||||
```
|
||||
|
||||
### Knowledge Base (LMS)
|
||||
|
||||
```sql
|
||||
ALTER TABLE knowledge_base
|
||||
ADD COLUMN embedding vector(384),
|
||||
ADD COLUMN embedding_updated_at TIMESTAMPTZ;
|
||||
|
||||
CREATE INDEX idx_knowledge_base_embeddings
|
||||
ON knowledge_base
|
||||
USING ivfflat (embedding vector_cosine_ops);
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "extension 'vector' does not exist"
|
||||
|
||||
Make sure you're using the pgvector Docker image:
|
||||
```bash
|
||||
docker-compose pull db
|
||||
docker-compose down
|
||||
docker-compose up -d db
|
||||
```
|
||||
|
||||
### Slow semantic search
|
||||
|
||||
1. Check if index exists:
|
||||
```sql
|
||||
SELECT indexname FROM pg_indexes WHERE tablename = 'question_bank';
|
||||
```
|
||||
|
||||
2. Verify index is being used:
|
||||
```sql
|
||||
EXPLAIN ANALYZE SELECT * FROM question_bank
|
||||
ORDER BY embedding <=> '[...]'::vector LIMIT 10;
|
||||
```
|
||||
|
||||
### Embeddings not generating
|
||||
|
||||
1. Check Ollama is running:
|
||||
```bash
|
||||
curl http://localhost:11434/api/tags
|
||||
```
|
||||
|
||||
2. Verify model is available:
|
||||
```bash
|
||||
ollama list | grep nomic-embed
|
||||
```
|
||||
|
||||
3. Check logs for errors:
|
||||
```bash
|
||||
docker logs openccb-studio-1 | grep -i embedding
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements:
|
||||
|
||||
1. **Multi-vector search** - Combine title, question, and explanation embeddings
|
||||
2. **Cross-lingual embeddings** - Support Spanish/English/Portuguese semantic search
|
||||
3. **Query rewriting** - Use LLM to improve search queries before embedding
|
||||
4. **Caching** - Cache common query embeddings for faster response
|
||||
5. **Analytics** - Track which questions are most similar/related
|
||||
|
||||
## References
|
||||
|
||||
- [pgvector GitHub](https://github.com/pgvector/pgvector)
|
||||
- [Ollama Embeddings API](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings)
|
||||
- [Nomic Embed Text Model](https://ollama.com/library/nomic-embed-text)
|
||||
Reference in New Issue
Block a user