Week 3 · Day 18/30

Observability for AI

Logging, tracing, monitoring LLM calls, cost tracking

📅 2026-03-21 ⏱️ 5-6 hodín 📊 Security & Production
Celkový progres 60%

🎯 Cieľ dňa

Implementovať end-to-end observability: distributed tracing cez OpenTelemetry, cost tracking, a alerting.

core practice

📚 Study Resources

📊

OpenTelemetry — LLM Observability Introduction

Oficiálny intro: traces, metrics, logs pre LLM aplikácie. Black box → glass box.

article
🔧

OpenLLMetry — Open-source LLM Observability

OpenTelemetry extensions pre LLM: auto-instrument OpenAI, Anthropic, vector DBs.

tool
📖

Agenta AI — LLM Observability with OpenTelemetry

AI Engineer's guide: distributed tracing, metrics, cost tracking.

guide
📈

OneUptime — LLM Observability Dashboard (Feb 2026)

Build dashboard: latency, token usage, costs, error rates, model performance.

tutorial

💡 Key Concepts

Distributed Tracing — Zaznamenanie celého request flow: user → API → LLM → tools → response. Spans a trace IDs.
OpenTelemetry — Industry standard pre observability. Semantic conventions 1.37+ pre generative AI.
LLM Metrics — Latency (p50/p95/p99), tokens per request, cost per request, error rate, cache hit rate
Cost Tracking — Per-request cost kalkulácia, budget alerts, model-tier breakdown

🔧 Praktické cvičenie

Pridaj observability do existujúceho AI systému.

  1. Nainštaluj OpenLLMetry pre auto-instrumentation
  2. Nastav Jaeger alebo Zipkin pre trace viewing
  3. Pridaj custom spans pre tool calls a retrieval
  4. Buildni cost tracking dashboard
  5. Nastav alerts pre high latency a error rate