financial-research-agent

AgentV companion eval project for a public coding/web financial research agent.

This repository is not a fork of Dexter and does not own Dexter's agent code or dataset. It uses Dexter's public src/evals/ dataset as a pinned benchmark fixture and golden-answer source so the AgentV Dashboard can show a realistic public domain-agent project.

Source Pin

The first public demo is pinned to Dexter commit:

8d9419829f443f84b804d033bb2c3b1fbd788629

Dexter's own eval flow at that commit uses:

bun run src/evals/run.ts
optional sampling with --sample N
src/evals/dataset/finance_agent.csv
CSV columns: Question, Answer, Question Type, Expert time (mins), Rubric
an LLM-as-judge correctness check, with CSV rubric metadata containing correctness and contradiction criteria

The committed AgentV eval keeps the question/answer fixture shape for every row in the pinned CSV: Dexter questions become AgentV input, and Dexter answers become expected_output. Dexter's runtime evaluator ignores the CSV Rubric column, but this project intentionally preserves those entries as native AgentV llm-grader rubrics. The shared prompt in prompts/dexter-grader.md receives AgentV's {{ rubrics_json }} and {{ metadata_json }} structured variables, so the eval does not duplicate question/answer data into grader-only payloads.

By default, the eval does not run Dexter. It runs a coding/web research agent against Dexter's public golden answers, so the demo does not require FINANCIAL_DATASETS_API_KEY. The real dexter-agent target remains available as an optional compatibility target for users who have Dexter's paid data prerequisites configured.

Prerequisites

Install AgentV separately.

For the default financial-research-agent target, configure a Codex-style coding agent plus a grader:

AGENT_TARGET=financial-research-agent
CODEX_EXECUTABLE=codex-eng
CODEX_MODEL=gpt-5.5
CODEX_REASONING_EFFORT=low
CODEX_WORKSPACE_DIR=.agentv/codex-workspaces
CODEX_LOG_DIR=.agentv/logs/codex
GRADER_TARGET=openai-grader
OPENAI_API_KEY=...
OPENAI_MODEL=gpt-5.5

Clone and pin Dexter only when regenerating eval YAML from Dexter's CSV or when running the optional real dexter-agent target:

git clone https://github.com/virattt/dexter.git ../dexter
git -C ../dexter checkout 8d9419829f443f84b804d033bb2c3b1fbd788629
cd ../dexter
bun install

Create local env for this project:

cp .env.example .env

Fill in only local values in .env. Do not commit .env, resolved provider endpoints, API keys, Bitwarden output, or result-repo tokens.

Required variables for the default public-demo target:

AGENT_TARGET=financial-research-agent
CODEX_EXECUTABLE
CODEX_MODEL
CODEX_WORKSPACE_DIR
CODEX_LOG_DIR
GRADER_TARGET
grader model variables for the selected grader target
for GRADER_TARGET=azure: AZURE_OPENAI_RESPONSES_BASE_URL, AZURE_OPENAI_API_KEY, and AZURE_DEPLOYMENT_NAME

Additional variables for optional AGENT_TARGET=dexter-agent:

DEXTER_REPO_PATH
OPENAI_API_KEY
FINANCIAL_DATASETS_API_KEY
EXASEARCH_API_KEY or TAVILY_API_KEY

Run

Preflight:

bun run setup

Run the full AgentV eval:

agentv eval evals/financial-research-agent.eval.yaml --targets .agentv/targets.yaml --target financial-research-agent

During AgentV repository development, prefer the source CLI from the AgentV checkout:

bun /path/to/agentv/apps/cli/src/cli.ts eval financial-research-agent/evals/financial-research-agent.eval.yaml --targets financial-research-agent/.agentv/targets.yaml --target financial-research-agent

For quick verification, run one committed test by ID:

agentv eval evals/financial-research-agent.eval.yaml --targets .agentv/targets.yaml --target financial-research-agent --test-id us-steel-nippon-merger

To run the real Dexter agent instead, use --target dexter-agent after setting the optional Dexter variables above.

Regenerate From Dexter CSV

After updating DEXTER_REPO_PATH and DEXTER_COMMIT, regenerate the full AgentV eval from Dexter's public CSV:

bun run scripts/generate-eval-from-dexter.ts --out evals/financial-research-agent.eval.yaml

Use --sample N --out <path> only for local experiments or quick generator checks; do not use a sampled file as the committed dataset boundary.

Review the generated eval before committing. The generator intentionally keeps the conversion conservative and AgentV-native: it preserves Dexter rubric entries as { operator, criteria }-style llm-grader rubric items, uses suite-level source metadata for the pinned CSV, and reuses prompts/dexter-grader.md by file reference.

Secret Boundary

Setup and target scripts print variable names and missing prerequisite guidance only. They must not print resolved secret values, private endpoints, or Bitwarden-derived output.

Public result synchronization belongs to the downstream financial-research-agent-evals work. Before publishing any run artifact, scan it for API keys, provider endpoints, private paths, and sensitive data.

AgentV Composition Note

The Dexter adaptation uses AgentV's native llm-grader primitive. Each assertion references prompts/dexter-grader.md and passes Dexter CSV rubric entries through rubrics, preserving operator plus criteria so the prompt can distinguish correctness checks from contradiction guards. Suite-level metadata carries the pinned Dexter source fields, while per-test metadata only carries row-specific fields such as source_row, question_type, and expert_time_mins.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.agentv		.agentv
evals		evals
prompts		prompts
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

financial-research-agent

Source Pin

Prerequisites

Run

Regenerate From Dexter CSV

Secret Boundary

AgentV Composition Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

financial-research-agent

Source Pin

Prerequisites

Run

Regenerate From Dexter CSV

Secret Boundary

AgentV Composition Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages