Day 7: Week 1 Capstone: Full RAG Pipeline — AI System Engineer

Week 1 Capstone — full RAG pipeline architecture

Celkový progres 23%

🎯 Cieľ dňa

Buildni document-grounded Q&A API s FastAPI, hybrid search, reranking, streaming a RAGAS evaluáciou.

project core

📚 Study Resources

Hugging Face — Code a Simple RAG from Scratch

Kompletný RAG od nuly s Python a Ollama, bez frameworkov.

LangChain — Build a RAG Agent

Framework-based approach pre porovnanie s from-scratch verziou.

DataCamp — RAG System with LangChain & FastAPI

Production-oriented: LangChain pre RAG + FastAPI pre API layer.

KDnuggets — 7 Steps to Build RAG from Scratch

Clean step-by-step guide. Ingestion, splitting, embedding, storage, query, retrieval, generation.

💡 Key Concepts

Full Pipeline — Query → Token counting → Decomposition → Hybrid retrieval → Reranking → Prompt + CoT → Streaming → Eval

API Layer — FastAPI s /ingest, /query (streaming), /evaluate endpointmi

Error Handling — Retry logic, exponential backoff, model fallback, rate limit handling

Integration Testing — End-to-end testy celého pipeline vrátane RAGAS metrík

🔧 Praktické cvičenie

Buildni full RAG pipeline API.

Ingestion pipeline: load docs, chunk (512 tokens, 50 overlap), embed, store do ChromaDB
Retrieval: hybrid search (vector + BM25), top-10, rerank na top-3
Generation: system prompt + CoT, streaming SSE response
FastAPI: /ingest, /query (streaming), /evaluate endpoints
Error handling: retry, fallback, rate limit handling
RAGAS evaluácia: 10+ test otázok, faithfulness > 0.7
Test streaming v browser/curl, trigger rate limit a verify fallback

🧠 Self-Assessment Questions

Vysvetli ako self-attention počíta attention weights cez Q/K/V matice.
Kedy by si použil few-shot vs chain-of-thought vs meta-prompting?
Prečo MMLU prestáva byť užitočný pre porovnanie frontier modelov?
Aký je rozdiel medzi 429 a 500 error, a ako ich handlovať?
Prečo je chunk size 256-512 tokenov recommended default?
Čo meria RAGAS faithfulness a prečo je dôležitý?
Kedy pridať reranking na top hybrid searchu vs hybrid samotný?