gecx eval reference

Usage: gecx eval <scenarios-dir> [options]

Arguments

  • <scenarios-dir> — directory to walk for *.scenario.ts, *.scenario.yaml, and *.scenario.yml files.

Options

FlagDescription
--jsonPrint the JSON EvalReport to stdout instead of the table
--output <path>Write the JSON report to a file
--baseline <path>Path to a previous EvalReport JSON to compare against
--fail-on-regressExit non-zero when any regression threshold trips, OR a new scenario fails relative to baseline
--regression-config <path>JSON file with { regressionThresholds: {...} } overrides
--update-baseline <path>Write the new report to this path (use after intentional improvements)
--filter <tag>Only run scenarios that include this tag
--config <path>EvalConfig JSON file (providers, scorers, regression thresholds)
--help, -hPrint usage

Exit codes

  • 0 — all scenarios passed (or skipped) and no regression tripped
  • 1 — at least one scenario failed, baseline parse error, config parse error, or --fail-on-regress tripped

Environment variables

  • ANTHROPIC_API_KEY — enable the Anthropic judge provider
  • OPENAI_API_KEY — enable the OpenAI judge provider
  • GEMINI_API_KEY — enable the Gemini judge provider

When a key is missing, scorers that need that provider return status: 'skipped'; the scenario itself is not failed.

JSON report

The shape is defined in schemas/eval-report.schema.json. Top-level keys:

  • runId — UUID for this run
  • startedAt / finishedAt — ISO timestamps
  • scenarios[] — per-scenario result with expectations and the full ScenarioRunRecord
  • metrics — aggregate metrics (see schema for full list)
  • env{ node, sdkVersion, providers: { anthropic, openai, gemini } }
Source: docs/reference/eval-cli.md