Skip to content

Hybrid Router - RouterArena Submission#148

Closed
mikemao27 wants to merge 15 commits into
RouteWorks:mainfrom
mikemao27:dev/hybrid-router
Closed

Hybrid Router - RouterArena Submission#148
mikemao27 wants to merge 15 commits into
RouteWorks:mainfrom
mikemao27:dev/hybrid-router

Conversation

@mikemao27

Copy link
Copy Markdown
Contributor

Relevant Files

  • hybrid-router.json- regular predictions
  • hybrid-router-robustness.json - robustness predictions
  • hybrid-router.json (config) - Router configuration

Submission Steps

  1. Fork https://github.com/RouteWorks/RouterArena
  2. Copy Files:
    • router_inference/predictions/hybrid-router.json
    • router_inference/predictions/hybrid-router-robustness.json
    • router_inference/config/hybrid-router.json
  3. Open PR to RouteWorks/RouterArena
  4. Comment /evaluate

Expected Performance

  • Accuracy: ~71.35%
  • Robustness Score: ~0.9667
  • Routing Distribution: ~98.85% to qwen3-235b-a22b, ~3.4% to ministral-3b, the remainder to qwen3-30b-a3b.

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

9 similar comments
@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@github-actions

Copy link
Copy Markdown

Router Evaluation Results

Router: hybrid-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7208
Accuracy 71.38%
Total Cost $0.321511
Avg Cost per Query $0.000038
Avg Cost per 1K Queries $0.0383
Number of Queries 8400
Abnormal Entries 0
Robustness Score 0.9667

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.8987
Opt.Cost (Cost Efficiency) 0.9419
Opt.Acc (Accuracy vs Optimal) 0.9281

Evaluation completed by RouterArena automated workflow

@yl231

yl231 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Thanks @mikemao27 — the submission itself looks clean and reproduces on our side, so we're glad to add it. One request before we merge: please slim the PR down to the files a leaderboard submission actually needs.

Per the README ("Submitting to the leaderboard"), a submission only requires:

  • router_inference/config/hybrid-router.json
  • router_inference/predictions/hybrid-router.json
  • router_inference/predictions/hybrid-router-robustness.json
  • model_cost/model_cost.json (cost entries for your new models)
  • universal_model_names.py (the new model names)

Could you drop the rest from this PR?

  • hybrid_router/, training/, tests/, scripts/*.sh — your implementation/training code. We don't vendor router code in this repo; if you'd like to share it, please link your own repo in the PR description instead.
  • cached_results/*.jsonl and the *.lock files — local inference-cache artifacts that shouldn't be committed.
  • metrics.json — ephemeral eval output, please don't commit it.
  • pyproject.toml / uv.lock / .gitignore — not needed for a predictions-only submission.

For reference, a clean submission was just 5 files. Once trimmed, re-post /evaluate and we'll merge. Thanks!

@mikemao27

Copy link
Copy Markdown
Contributor Author

I think the pre-commit errors for mypy are in llm_inference.pipeline.py. This is a base repository file that wasn't modified by this PR. So, it shouldn't have anything to do with my code. Thus, I'll ignore it. Past pre-commit checks succeeded for previous iterations of this PR. So, I'll continue with evaluation. Let me know if it's an error that needs checking (I believe other similar PRs also faced pre-commit errors or some sort, potentially different from mine).

@mikemao27

Copy link
Copy Markdown
Contributor Author

/evaluate

@mikemao27 mikemao27 closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants