hex-foundation — Test Matrix

This document describes the test suite, what each test verifies, and how to run it locally.

Test categories

Category	Files	Needs API key
Static / unit	`test_skill_frontmatter.sh`, `test_skill_refs.sh`, `test_path_mapping.bats`, `test_hex_doctor_version_sync.bats`, `test_hex_doctor_hex_binary_version_sync.bats`	No
Core E2E (containerized)	`tests/core-e2e/run-all.sh`	BOI suites only
Live eval — Claude Code	`test_skill_discovery.sh`, `test_e2e.sh`, `test_fullstack.sh`	Yes
Live eval — Codex	`test_skill_discovery_codex.sh`, `test_codex_onboarding.sh`	Yes
Codex parity (containerized)	`tests/codex-parity/run-all.sh`	No (structural); `OPENAI_API_KEY` for live
Migration	`tests/migrate/test-migrate.sh`	No
Memory	`test_memory.py`	No

Core E2E suite (`tests/core-e2e/`)

Auto-discovers all tests/core-e2e/suites/*.sh files and runs them. Non-BOI suites run inside the tests/core-e2e/Dockerfile container; BOI integration suites run on the host (they need Docker access to spin up their own containers).

CI runs both jobs on every PR and blocks merges on failure (see .github/workflows/core-e2e.yml).

# All suites (host must have Docker)
bash tests/core-e2e/run-all.sh

# Filter by pattern — useful when iterating on a specific suite
bash tests/core-e2e/run-all.sh --include boi          # BOI suites only
bash tests/core-e2e/run-all.sh --exclude boi          # skip BOI (e.g. inside Docker)
bash tests/core-e2e/run-all.sh --include 'install|upgrade'  # regex match on suite name

Current suites:

Suite	What it verifies
`test-boi-install`	Fresh BOI install: binary builds, `--help`/`--version`, smoke dispatch
`test-boi-upgrade`	Upgrade path: version bump, stale-symlink detection, doctor catches dangling link
`test-cli`	All `hex` subcommands reachable; version matches `Cargo.toml`
`test-messaging`	Message send/receive/filter with SQLite verification
`test-doctor`	`hex-doctor` passes on healthy install, fails loudly on broken config

Tests added in v0.2.4

`tests/test_skill_frontmatter.sh`

Validates every system/skills/*/SKILL.md without running any agent. Checks:

Frontmatter block exists at the top of the file.
name field is present and matches the skill directory name.
description field is present and non-empty.
If allowed-tools is present, it is a YAML list of strings.

Exit 0 = all valid. Exit 1 = summary of failures.

`tests/test_skill_refs.sh`

Installs hex to a temp dir and verifies that every path reference inside SKILL.md files resolves on disk. Catches broken references to scripts, templates, or commands before they reach users.

`tests/test_skill_discovery.sh`

Runs Claude Code in --print mode inside a fresh hex install and asserts:

All currently shipped skills appear in Claude's response to a discovery prompt (session-lifecycle skills hex-startup, hex-checkpoint, hex-shutdown, hex-reflect were demolished and must not be expected here).
At least 3 skills (/hex-doctor, /hex-decide, /hex-triage) can be invoked without crashing.

Requires ~/.hex-test.env with ANTHROPIC_API_KEY.

`tests/test_skill_discovery_codex.sh`

Mirror of the above for Codex. Because Codex reads AGENTS.md rather than SKILL.md files directly, this test verifies that the 11 skill names surface via AGENTS.md context and that Codex can perform the same three invocations.

Codex parity suite (`tests/codex-parity/`)

Seven tests that verify behavioral parity between the Claude Code and Codex runtimes. Runs inside a Docker container with Node.js + Codex CLI installed. Structural tests run without an API key; live-dispatch tests are skipped automatically when OPENAI_API_KEY is absent.

bash tests/codex-parity/run-all.sh

Test	What it verifies	API key
`test-install-shape.sh`	Fresh hex install produces `.hex/scripts/`, `.hex/skills/`, `.hex/bin/`, `CLAUDE.md`, `AGENTS.md`	No
`test-agents-md-complete.sh`	`AGENTS.md` covers all sections present in `CLAUDE.md`	No
`test-skill-discovery.sh`	All skills are discoverable from `.hex/skills/*/SKILL.md` under Codex	No
`test-doctor-codex.sh`	`doctor.sh` includes and passes the Codex CLI check	No
`test-upgrade-codex.sh`	`upgrade.sh` preserves `AGENTS.md` user customizations	No
`test-boi-dispatch-codex.sh`	Minimal spec with `runtime=codex` completes and produces output	Yes
`test-memory-search.sh`	Memory search index and CLI work identically under the Codex runtime	No

The codex-parity gate in the hex release cut battery runs this suite and blocks the release on failure; structural tests always run, live tests are skipped when no key is present. The gate is skipped loudly when the directory is absent or --skip-parity (or --skip-e2e, which implies it) is passed.

Running locally

Prerequisites

Docker (for Docker eval suite)
Tart (for macOS eval suite — Apple Silicon only)
~/.hex-test.env containing at minimum:
```
ANTHROPIC_API_KEY=sk-ant-...
```

Static tests (no API key)

cd /path/to/hex-foundation

bash tests/test_skill_frontmatter.sh
bash tests/test_skill_refs.sh
bash tests/migrate/test-migrate.sh
python3 tests/test_memory.py
bats tests/test_hex_doctor_version_sync.bats
bats tests/test_hex_doctor_hex_binary_version_sync.bats

Full Docker eval suite

bash tests/eval/run_eval_docker.sh --live

Individual cases:

bash tests/eval/run_eval_docker.sh --live --case skill-frontmatter
bash tests/eval/run_eval_docker.sh --live --case skill-refs
bash tests/eval/run_eval_docker.sh --live --case skill-discovery
bash tests/eval/run_eval_docker.sh --live --case skill-discovery-codex

Full macOS Tart eval suite

bash tests/eval/run_eval_macos.sh

Shipped skills

The skills installed under .hex/skills/ (verified by test_skill_discovery.sh). Note: hex-consolidate was removed in favor of the hex memory consolidate full|quick binary subcommand (the single consolidate surface — see architecture.md). The session-lifecycle skills (hex-startup, hex-checkpoint, hex-shutdown, hex-reflect, hex-debrief) were demolished — they are no longer shipped and must not be re-added to this test plan.

hex-decide
hex-triage
hex-doctor
landings
memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hex-foundation — Test Matrix

Test categories

Core E2E suite (`tests/core-e2e/`)

Tests added in v0.2.4

`tests/test_skill_frontmatter.sh`

`tests/test_skill_refs.sh`

`tests/test_skill_discovery.sh`

`tests/test_skill_discovery_codex.sh`

Codex parity suite (`tests/codex-parity/`)

Running locally

Prerequisites

Static tests (no API key)

Full Docker eval suite

Full macOS Tart eval suite

Shipped skills

FilesExpand file tree

testing.md

Latest commit

History

testing.md

File metadata and controls

hex-foundation — Test Matrix

Test categories

Core E2E suite (tests/core-e2e/)

Tests added in v0.2.4

tests/test_skill_frontmatter.sh

tests/test_skill_refs.sh

tests/test_skill_discovery.sh

tests/test_skill_discovery_codex.sh

Codex parity suite (tests/codex-parity/)

Running locally

Prerequisites

Static tests (no API key)

Full Docker eval suite

Full macOS Tart eval suite

Shipped skills

Core E2E suite (`tests/core-e2e/`)

`tests/test_skill_frontmatter.sh`

`tests/test_skill_refs.sh`

`tests/test_skill_discovery.sh`

`tests/test_skill_discovery_codex.sh`

Codex parity suite (`tests/codex-parity/`)