Embeddings, vector databases, chunking strategies, retrieval methods
Pochopiť a implementovať RAG pipeline — od chunkingu cez embeddings po vector search a generáciu.
Jasný, praktický walkthrough troch core RAG komponentov.
articleIn-depth analýza chunking stratégií s benchmarkmi od vector DB makera.
articleRecursive 512-token splitting: 69% accuracy vs semantic chunking: 54%. Defaults: 256-512 tokens, 10-20% overlap.
benchmarkOpenAI text-embedding-3, Voyage 3.5, Nomic V2, Cohere embed-v4. MTEB benchmarks + pricing.
comparisonFAISS, Chroma, Pinecone, Qdrant, Weaviate. Chroma pre prototyping, Qdrant pre hybrid.
comparisonBuildni document Q&A systém.