Autoresearch Competitions

A Tangle Blueprint for a decentralized market in verifiable improvement. A Proposer posts a competition — a Surface to improve, a Scorer that measures it, a Reward, and a few knobs. A crowd of Researchers (humans, agents, or automated research loops, type-agnostic) submit a method — an auto-research agent / improvement code. They do not bring compute: the Node Operator provides the sandboxed compute and runs the researcher's method inside it, next to the proposer's sealed target — a plain Docker sandbox or a sealed TEE enclave, chosen by a single toggle. A Referee runs the Scorer on a held-out measure and certifies the result. Payment settles on-chain for proven improvement, on a leaderboard anyone can verify.

A Tangle Blueprint is a spec for an on-chain service operators run and settle on-chain: operators run an off-chain service and settle on-chain (tnt-core 0.13, EVM, x402 payments, staking/slashing). This repo currently holds a hello-world scaffold; these docs describe the real product we are building.

Thesis

Post a bounty for a better anything — a better trading agent, a better quantum circuit, a better model checkpoint — measured by a test you define, on any cadence, public or private. The network competes or collaborates to build it, and you pay only for proven results. The hard part of research is producing a better artifact; checking that one artifact is better is just running the scorer — so the market pays for the outcome and lets verification stay cheap.

Pay for outcomes, not effort

Research has a solve-hard / verify-easy asymmetry: finding a better artifact can take enormous compute and ingenuity, but confirming it scored higher on a held-out test takes one cheap, reproducible run. Pricing the outcome (the certified score) instead of the effort (hours, GPUs, headcount) makes that asymmetry the whole mechanism.

This dissolves two problems at once. Verification collapses to "run the Scorer" — no need to audit how a Researcher worked. Privacy mostly evaporates because Researchers see scores, not data: the Proposer's held-out set, private oracle, or sealed eval never leaves the Referee, yet still produces a number everyone can trust.

The four-knob model

Every competition is defined by four orthogonal knobs. Any combination is valid.

Knob	Options	Meaning
Structure	`Competitive`	Separate submissions, ranked on a leaderboard; pay the top-k.
	`Collaborative`	Pooled compute on one shared artifact; pay by contribution share.
Cadence	`OneShot`	A deadline and a terminal payout; settle once.
	`Continuous`	King-of-the-hill; the leaderboard keeps moving and reward flows for marginal improvement over the current best (streaming / per-epoch).
Visibility	`Public`	Open, viral arena — anyone can watch, enter, and verify.
	`Private`	Sealed enterprise competition behind access control.
Scorer type	`HeldOutEval`	Score against a held-out evaluation split.
	`PrivateOracle`	Score against a hidden reference the Proposer keeps secret.
	`PrivilegedHardware`	Score on hardware only the Referee can run.
	`HumanPanel`	Score via a panel of human judges.

Reference scenario A is Competitive × OneShot × Private × PrivateOracle; B is Competitive × Continuous × Public × HeldOutEval; C is Competitive × OneShot × Private × HeldOutEval. The four knobs span the product.

How it composes

The chain is the settlement and commitment spine — it carries O(competitions), not O(artifacts). All heavy compute runs in ephemeral sandboxes that scale out horizontally and across instances.

                         ┌──────────────────────────────────────────┐
   Proposer  ──posts──▶  │   On-chain settlement spine (EVM)         │
   (demand,             │   competitions · escrow · certified        │
    escrow)             │   scores · x402 payouts · disputes         │
                         └──────────────┬───────────────────────────┘
                                        │ schedules / settles
                                        ▼
  Researchers ──submit methods──▶ ─────────────────────────────────────┐
  (supply; bring the METHOD,                                            │
   NOT the compute)                                                     ▼
                         ┌──────────────────────────────────────────┐
                         │   Node Operator fleet (Tangle infra)      │
                         │   PROVIDES the compute, RUNS each method  │
                         │   + is the Referee. One-field toggle:     │
                         │   SandboxBackend = Docker (no-TEE) | Tee  │
                         └──────────────┬───────────────────────────┘
            ┌───────────────────────────┼───────────────────────────┐
            ▼                           ▼                           ▼
   ┌─────────────────┐        ┌─────────────────┐        ┌─────────────────┐
   │ OPERATOR        │        │ OPERATOR        │        │ OPERATOR        │
   │ sandbox runs    │  ...   │ sandbox runs    │  ...   │ sandbox runs    │
   │ submitted       │        │ submitted       │        │ submitted       │
   │ method          │        │ method (TEE:    │        │ method          │
   │ (Docker)        │        │ sealed+no-egr.) │        │ (Docker)        │
   └────────┬────────┘        └────────┬────────┘        └────────┬────────┘
            └───────── candidate artifacts ───────────────────────┘
                                        │
                                        ▼
                         ┌──────────────────────────────────────────┐
                         │   Referee  ──runs──▶  Scorer (held-out)   │
                         │   certifies value + CI, commits on-chain  │
                         │   Validator m-of-n backstop on dispute    │
                         └──────────────┬───────────────────────────┘
                                        ▼
                  Verifiable leaderboard  +  artifact marketplace

This blueprint:

Builds on the agent-sandbox blueprint as the wired operator compute — the SandboxHost seam (autoresearch-sandbox) provisions a sandbox and runs each submitted method via sandbox-runtime (TEE backends Phala / Nitro / GCP / Azure, sealed secrets, cloud / instance / tee-instance modes). The default LocalSandboxHost is an in-process stand-in for tests; the real SandboxRuntimeHost is feature-gated (autoresearch-sandbox-runtime).
Mirrors the ai-trading-blueprint patterns — provision / configure / start / stop / status / deprovision jobs, an operator-hosted sidecar Docker agent loop that runs the submitted method, validator m-of-n EIP-712 attestation, a self-improvement loop, and x402 pricing.
Composes the training-blueprint as the Collaborative Engine (DeMo distributed training over pooled compute).
Agent-profile Scorer stand-in — a closed-form model of agent-profile pass-rate dynamics, producing certified causal lift on held-out data (default minLiftCiLower 0.02, n ≥ 12). A real external agent evaluator can plug into the same seam.

Core interfaces (pluggable)

Interface	Responsibility
Surface	What may change, and how a candidate artifact is represented and applied.
Scorer	`score(artifact, split) -> {value, ci, cost, diagnostics}`; runs on held-out data. May wrap an eval suite, a private oracle, privileged hardware, or a human panel.
Engine	The method that produces candidates: a sandboxed agent self-improvement loop, a DeMo distributed-training run, a black-box optimizer, or a raw human submission. The Researcher submits it; the Operator runs it on operator-provided sandboxed compute (`SandboxMethodEngine` + `SandboxHost`).
RewardSchedule	`RecordBounty` (marginal lift over best) · `TimeAtTopStreaming` · `SnapshotTopK` · `TerminalPrize`.

Roles

Role	In the market
Proposer	Demand side; posts the competition and funds escrow.
Researcher	Supply side; submits a method that the Operator runs. Brings the method, NOT the compute, and never runs it themselves. Human, agent, or automated loop.
Referee	Runs the Scorer, certifies results, commits them on-chain (TEE service, the Proposer, or a committee).
Validator	The m-of-n dispute backstop.
Node Operator	Tangle infra node running the blueprint binary. Provides the sandboxed compute and RUNS the researcher's submitted method (Docker no-TEE or sealed TEE enclave — a one-field toggle), and is the Referee. Distinct from a Researcher: the Researcher submits the method, the Operator runs it.

Three reference scenarios

A — Private Oracle (frontier science). Improve against a hidden reference the Proposer never reveals (e.g. a withheld quantum circuit benchmark). Competitive × OneShot × Private × PrivateOracle.
B — Public Continuous Arena. A verifiable, challengeable, moving leaderboard with a marketing microsite — the open arena play. Competitive × Continuous × Public × HeldOutEval.
C — Private Enterprise Bounty. "Improve my agent on my sealed held-out eval" — the monetization motion. Competitive × OneShot × Private × HeldOutEval.

Project structure

Proposed Rust workspace and contract layout (crate names marked (proposed) are not yet implemented):

autoresearch-competitions/
  Cargo.toml                         # Workspace configuration
  metadata/
    blueprint-metadata.json          # Offchain blueprint metadata (IPFS/HTTPS)
  autoresearch-competitions-lib/     # Blueprint library: jobs + router
    src/lib.rs
  autoresearch-competitions-bin/     # Blueprint runner binary
    src/main.rs
  contracts/                         # Solidity: competition registry, escrow,
                                     # certified-score commitments, x402, disputes
  # Proposed crates (design phase):
  crates/
    surface/        # (proposed) Surface trait + built-in surfaces
    scorer/         # (proposed) Scorer trait; HeldOutEval/PrivateOracle/
                    #            PrivilegedHardware/HumanPanel backends
    engine/         # (proposed) Engine trait; sandbox loop, DeMo, optimizer,
                    #            human-submission adapters
    reward/         # (proposed) RewardSchedule implementations
    referee/        # (proposed) Referee service: certify + on-chain commit
    market/         # (proposed) competition lifecycle + leaderboard state

Documentation

These design documents are being authored now (see Status). Links resolve as each lands.

Document	What it covers
`SPEC.md`	The normative spec: knobs, interfaces, roles, on-chain types, and job ABI.
`docs/RESEARCH.md`	Market thesis, prior art, and the EigenCloud / Eigen Arena / OpenRank competitive landscape.
`docs/ARCHITECTURE.md`	System architecture: settlement spine, operator fleet, sandboxes, Referee, and composition with the sandbox/training/agent-profile substrates.
`docs/MECHANISM.md`	Incentive mechanism: reward schedules, marginal-lift pricing, anti-gaming, and dispute resolution.
`docs/PRIVACY.md`	Privacy model: scores-not-data, TEE boundaries, sealed held-out sets, and private oracles.
`ROADMAP.md`	Phased delivery plan from scaffold to the three reference scenarios.
`docs/IMPLEMENTATION-PLAN.md`	Crate-by-crate build plan, milestones, and test strategy.

Status

Design phase. The repo is a hello-world scaffold; the product above is being specified before implementation. Specs come first (SPEC.md and the docs/ set), then implementation proceeds per ROADMAP.md. Crate names marked (proposed) are subject to change as the design lands.

Prerequisites

Before you can run this project, you will need to have the following software installed on your machine:

Rust 1.86+
Forge (for smart contract development)

You will also need to install cargo-tangle, our CLI tool for creating and deploying Tangle Blueprints:

cargo install cargo-tangle --git https://github.com/tangle-network/blueprint --branch v2

Development

Build the project:

cargo build

Run tests:

cargo test

Deploy the blueprint to the Tangle network:

cargo tangle blueprint deploy tangle --network devnet

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Feedback and Contributions

We welcome feedback and contributions to improve this blueprint. Please open an issue or submit a pull request on our GitHub repository.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.cargo		.cargo
.github		.github
autoresearch-competitions-bin		autoresearch-competitions-bin
autoresearch-competitions-lib		autoresearch-competitions-lib
autoresearch-generic-engine		autoresearch-generic-engine
autoresearch-protocol		autoresearch-protocol
autoresearch-runtime		autoresearch-runtime
autoresearch-sandbox-runtime		autoresearch-sandbox-runtime
autoresearch-sandbox		autoresearch-sandbox
autoresearch-training-blueprint-adapter		autoresearch-training-blueprint-adapter
autoresearch-training-runtime		autoresearch-training-runtime
autoresearch-verticals		autoresearch-verticals
contracts		contracts
docs		docs
experiments/nanogpt		experiments/nanogpt
metadata		metadata
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
OPERATORS.md		OPERATORS.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SPEC.md		SPEC.md
blueprint-definition.json		blueprint-definition.json
flake.lock		flake.lock
flake.nix		flake.nix
foundry.toml		foundry.toml
remappings.txt		remappings.txt
rust-toolchain.toml		rust-toolchain.toml
soldeer.lock		soldeer.lock
taplo.toml		taplo.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoresearch Competitions

Thesis

Pay for outcomes, not effort

The four-knob model

How it composes

Core interfaces (pluggable)

Roles

Three reference scenarios

Project structure

Documentation

Status

Prerequisites

Development

License

Feedback and Contributions

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoresearch Competitions

Thesis

Pay for outcomes, not effort

The four-knob model

How it composes

Core interfaces (pluggable)

Roles

Three reference scenarios

Project structure

Documentation

Status

Prerequisites

Development

License

Feedback and Contributions

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages