Nurfog/openccb

Fork 0

Files

T

Nurfog 64d3d5be91 feat: implementing embedding AI

2026-03-18 17:15:39 -03:00

8.2 KiB

Raw Blame History

PGVector Embeddings Implementation Guide

Overview

OpenCCB now includes semantic search capabilities using PostgreSQL's pgvector extension and Ollama's embedding models. This enables:

Semantic question search - Find similar questions in the question bank
Improved RAG for question generation - Generate questions based on semantic similarity
Enhanced AI tutor chat - Better context retrieval from knowledge base

Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│   User Query    │────▶│   Ollama     │────▶│  Embedding      │
│   (text)        │     │  (embeddings)│     │  Vector (384)   │
└─────────────────┘     └──────────────┘     └────────┬────────┘
                                                      │
                                                      ▼
┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│  Search Results │◀────│  PostgreSQL  │◀────│  pgvector       │
│  (similar items)│     │  + pgvector  │     │  cosine search  │
└─────────────────┘     └──────────────┘     └─────────────────┘

Installation

1. Update Docker Compose

Change the database image to include pgvector:

# docker-compose.yml
services:
  db:
    image: pgvector/pgvector:pg16  # Was: postgres:16-alpine

2. Pull Embedding Model

docker pull ollama/ollama:latest
docker exec -it ollama ollama pull nomic-embed-text

3. Run Migrations

# CMS migrations (question_bank embeddings)
DATABASE_URL=postgresql://user:password@localhost:5433/openccb_cms \
  sqlx migrate run --source services/cms-service/migrations

# LMS migrations (knowledge_base embeddings)
DATABASE_URL=postgresql://user:password@localhost:5433/openccb_lms \
  sqlx migrate run --source services/lms-service/migrations

4. Generate Embeddings

After migration, generate embeddings for existing data:

# Generate question embeddings
curl -X POST http://localhost:3001/question-bank/embeddings/generate \
  -H "Authorization: Bearer YOUR_TOKEN"

# Generate knowledge base embeddings
curl -X POST http://localhost:3002/knowledge-base/embeddings/generate \
  -H "Authorization: Bearer YOUR_TOKEN"

API Endpoints

CMS (Port 3001)

Method	Endpoint	Description
POST	`/question-bank/embeddings/generate`	Generate embeddings for all questions without them
POST	`/question-bank/{id}/embedding/regenerate`	Regenerate embedding for a specific question
GET	`/question-bank/semantic-search?query=...`	Search questions by semantic similarity
GET	`/question-bank/similar/{id}?threshold=0.85`	Find questions similar to a given question

LMS (Port 3002)

Method	Endpoint	Description
POST	`/knowledge-base/embeddings/generate`	Generate embeddings for knowledge base entries
POST	`/knowledge-base/{id}/embedding/regenerate`	Regenerate embedding for a specific entry
GET	`/knowledge-base/semantic-search?query=...`	Search knowledge base semantically

Configuration

Environment Variables

# .env
LOCAL_OLLAMA_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text

Supported Embedding Models

Model	Dimensions	Speed	Quality	Recommended
`nomic-embed-text`	768	Fast	Good	✅ Default
`mxbai-embed-large`	1024	Medium	Better	For higher accuracy
`all-minilm`	384	Very Fast	Good	For resource-constrained

Pull models with:

ollama pull nomic-embed-text
ollama pull mxbai-embed-large
ollama pull all-minilm

Usage Examples

1. Semantic Question Search

curl -G "http://localhost:3001/question-bank/semantic-search" \
  -d "query=questions about past tense verbs" \
  -d "limit=10" \
  -d "threshold=0.6" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

[
  {
    "id": "uuid-here",
    "question_text": "Choose the correct past tense of 'to go'",
    "question_type": "multiple-choice",
    "similarity": 0.87,
    "tags": ["grammar", "past-tense"],
    "difficulty": "medium",
    "points": 1
  }
]

2. Find Duplicate Questions

curl -G "http://localhost:3001/question-bank/similar/{question-id}" \
  -d "threshold=0.95" \
  -H "Authorization: Bearer YOUR_TOKEN"

3. RAG Question Generation (Enhanced)

curl -X POST "http://localhost:3001/test-templates/generate-with-rag" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "topic": "present perfect tense",
    "num_questions": 5
  }'

This now uses semantic search to find relevant questions from the bank, not just keyword matching.

Performance Considerations

Index Tuning

The migrations create IVFFlat indexes optimized for >10k rows. For larger datasets:

-- For 100k+ rows, increase lists parameter
DROP INDEX IF EXISTS idx_question_embeddings;
CREATE INDEX idx_question_embeddings 
ON question_bank 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 1000);  -- Default: 100

Embedding Generation Speed

~50ms per embedding with Ollama (local)
Batch generation: 100 questions ≈ 5 seconds
Recommended: Generate embeddings in background during off-peak hours

Query Performance

Operation	Without Index	With IVFFlat
Similarity search (10k rows)	~500ms	~20ms
Similarity search (100k rows)	~5s	~50ms

Hybrid Search Strategy

The implementation uses a hybrid approach:

First: Try semantic search with embeddings (most accurate)
Fallback: Full-text search with tsvector (if embeddings unavailable)

This ensures the system works even if:

Ollama is temporarily unavailable
Embeddings haven't been generated yet
You want to minimize latency for simple queries

Database Schema

Question Bank (CMS)

ALTER TABLE question_bank
ADD COLUMN embedding vector(384),
ADD COLUMN embedding_updated_at TIMESTAMPTZ;

CREATE INDEX idx_question_embeddings 
ON question_bank 
USING ivfflat (embedding vector_cosine_ops);

Knowledge Base (LMS)

ALTER TABLE knowledge_base
ADD COLUMN embedding vector(384),
ADD COLUMN embedding_updated_at TIMESTAMPTZ;

CREATE INDEX idx_knowledge_base_embeddings 
ON knowledge_base 
USING ivfflat (embedding vector_cosine_ops);

Troubleshooting

"extension 'vector' does not exist"

Make sure you're using the pgvector Docker image:

docker-compose pull db
docker-compose down
docker-compose up -d db

Slow semantic search

Check if index exists:

SELECT indexname FROM pg_indexes WHERE tablename = 'question_bank';

Verify index is being used:

EXPLAIN ANALYZE SELECT * FROM question_bank 
ORDER BY embedding <=> '[...]'::vector LIMIT 10;

Embeddings not generating

Check Ollama is running:

curl http://localhost:11434/api/tags

Verify model is available:

ollama list | grep nomic-embed

Check logs for errors:

docker logs openccb-studio-1 | grep -i embedding

Future Enhancements

Potential improvements:

Multi-vector search - Combine title, question, and explanation embeddings
Cross-lingual embeddings - Support Spanish/English/Portuguese semantic search
Query rewriting - Use LLM to improve search queries before embedding
Caching - Cache common query embeddings for faster response
Analytics - Track which questions are most similar/related

8.2 KiB Raw Blame History