#evaluations + #llm
Public notes from activescott tagged with both #evaluations and #llm
Thursday, April 2, 2026
Wednesday, March 11, 2026
What is Arize Phoenix? - Phoenix
Phoenix is an open-source AI observability platform designed for experimentation, evaluation, and troubleshooting. It provides:
Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation.
Evaluation - Leverage LLMs to benchmark your application's performance using response and retrieval evals.
Datasets - Create versioned datasets of examples for experimentation, evaluation, and fine-tuning.
Experiments - Track and evaluate changes to prompts, LLMs, and retrieval.
Playground- Optimize prompts, compare models, adjust parameters, and replay traced LLM calls.
Prompt Management- Manage and test prompt changes systematically using version control, tagging, and experimentation.