`gecx eval` reference

Usage: gecx eval <scenarios-dir> [options]

Arguments

<scenarios-dir> — directory to walk for *.scenario.ts, *.scenario.yaml, and *.scenario.yml files.

Flag	Description
`--json`	Print the JSON `EvalReport` to stdout instead of the table
`--output <path>`	Write the JSON report to a file
`--baseline <path>`	Path to a previous `EvalReport` JSON to compare against
`--fail-on-regress`	Exit non-zero when any regression threshold trips, OR a new scenario fails relative to baseline
`--regression-config <path>`	JSON file with `{ regressionThresholds: {...} }` overrides
`--update-baseline <path>`	Write the new report to this path (use after intentional improvements)
`--filter <tag>`	Only run scenarios that include this tag
`--config <path>`	`EvalConfig` JSON file (providers, scorers, regression thresholds)
`--help`, `-h`	Print usage

0 — all scenarios passed (or skipped) and no regression tripped
1 — at least one scenario failed, baseline parse error, config parse error, or --fail-on-regress tripped

When a key is missing, scorers that need that provider return status: 'skipped'; the scenario itself is not failed.

The shape is defined in schemas/eval-report.schema.json. Top-level keys:

runId — UUID for this run
startedAt / finishedAt — ISO timestamps
scenarios[] — per-scenario result with expectations and the full ScenarioRunRecord
metrics — aggregate metrics (see schema for full list)
env — { node, sdkVersion, providers: { anthropic, openai, gemini } }

Source: docs/reference/eval-cli.md