add GHA skeleton for routing regression pipeline#12
Conversation
|
key decisions in this skeleton:
|
nilsnolde
left a comment
There was a problem hiding this comment.
thanks. I think for now you can ignore the comments. I needed to think about this a bit more and see what makes most sense.
IMO we have to split this up into multiple ymls. one for building tiles, one for routing regression tests, at least. so we can keep the layout sane.
you know what @Sherley-Sonali . best you research how to semantically do this best with multiple files so the layout makes most sense. keep some stuff in mind for research:
workflow_calldoes "reusable workflows" (could be e.g. the testing yml calling the build_tiles workflow)- we can have our own
.github/workflows/actions/build-valhalla.ymlwhich centralizes the build for both (and eventually most) workflows
| - name: Download tiles artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: ${{ inputs.tiles_artifact }} |
There was a problem hiding this comment.
per your description of the input: valhalla-tiles-master will download what from where? we have nothing building a tileset for master yet.
I'm not quite sure how to handle this exactly.. artifacts work per workflow run, meaning you'd always need to build both "old" and "new" graphs in each run. if we use artifacts for this. let me think for a second.
There was a problem hiding this comment.
this still stands @Sherley-Sonali, I still don't see how this could make sense with artifacts.
nothing here actually says "build tiles". there's apparently some GHA config for it, but who calls that?
…g-regression workflows
|
Here's a first pass at the layout split, based on the points you raised:
Existing TODOs (admins.sqlite, run_routes.py, diff_responses.py, push_results.py, RAD-data push race) carried over as-is - this PR is just the layout split, no new functionality yet. |
|
@Sherley-Sonali hm, seems I worded it wrong: of course a lot of my comments are still valid! and you need to address them! just not the ones which try to review logic which would be changing anyways with my suggestion to split it into multiple files. |
nilsnolde
left a comment
There was a problem hiding this comment.
still some way to go. also keep in mind that eventually we need 2 refs to build tiles from, even if we just start with "old" for now.
| secrets: inherit | ||
|
|
||
| build-tiles: | ||
| runs-on: ubuntu-24.04 |
There was a problem hiding this comment.
| runs-on: ubuntu-24.04 | |
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
we want to know early if there's anything failing with current releases
| build-valhalla: | ||
| uses: ./.github/workflows/build-valhalla.yml | ||
| with: | ||
| valhalla_ref: ${{ inputs.valhalla_ref }} | ||
| secrets: inherit |
There was a problem hiding this comment.
build-valhalla now produces a wheel artifact that build-tiles downloads and installs, no rebuilding inline
| - name: Rebuild Valhalla | ||
| run: | | ||
| cmake -B valhalla-src/build -S valhalla-src \ | ||
| -DCMAKE_BUILD_TYPE=Release \ | ||
| -DENABLE_PYTHON_BINDINGS=ON \ | ||
| -DENABLE_SERVICES=OFF \ | ||
| -DENABLE_TESTS=OFF \ | ||
| -DENABLE_BENCHMARKS=OFF \ | ||
| -DENABLE_CCACHE=ON \ | ||
| -DENABLE_TOOLS=OFF \ | ||
| -DENABLE_GEOTIFF=OFF \ | ||
| -DENABLE_LZ4=OFF | ||
| make -C valhalla-src/build -j$(nproc) | ||
| sudo make -C valhalla-src/build install |
There was a problem hiding this comment.
why do the build "inline" if there's a drop-in workflow call?
There was a problem hiding this comment.
addressed in redesign
| valhalla_build_admins -c valhalla.json data/liechtenstein_graph.osm.pbf | ||
| valhalla_build_config \ | ||
| --mjolnir-tile-dir valhalla_tiles > valhalla.json | ||
| valhalla_build_tiles -c valhalla.json data/liechtenstein_graph.osm.pbf |
There was a problem hiding this comment.
verified locally now
There was a problem hiding this comment.
please keep in mind to always verify locally before you actually commit, or at least before you ask for review! it's part of the learning process.
| - name: Download tiles artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: ${{ inputs.tiles_artifact }} |
There was a problem hiding this comment.
this still stands @Sherley-Sonali, I still don't see how this could make sense with artifacts.
nothing here actually says "build tiles". there's apparently some GHA config for it, but who calls that?
| - name: Build Valhalla at router ref | ||
| run: | | ||
| cmake -B valhalla-src/build -S valhalla-src \ | ||
| -DCMAKE_BUILD_TYPE=Release \ | ||
| -DENABLE_PYTHON_BINDINGS=ON \ | ||
| -DENABLE_SERVICES=OFF \ | ||
| -DENABLE_TESTS=OFF \ | ||
| -DENABLE_BENCHMARKS=OFF \ | ||
| -DENABLE_CCACHE=ON \ | ||
| -DENABLE_TOOLS=OFF \ | ||
| -DENABLE_GEOTIFF=OFF \ | ||
| -DENABLE_LZ4=OFF | ||
| make -C valhalla-src/build -j$(nproc) | ||
| sudo make -C valhalla-src/build install |
There was a problem hiding this comment.
no inline build now
|
tiles are built by running And on needing - 2 refs to build tiles: |
There was a problem hiding this comment.
so to summarize how it currently would work when I'd like to run a regression test some day:
- I run
build-tiles.ymlmanually with the right git SHAs, which builds the bindings for each SHA - I wait for that process to finish: AFAIK I have to refresh the PR page constantly, bcs I will not get notified from GH when the tile build run exited with success
- then I manually run
routing-regression.ymlwith the same input asbuild-tiles.ymlplus atiles_run_id, which I have to hunt down from the 1. step (build-tiles.ymlrun) - in
routing-regression.yml, for each SHA we again build valhalla (as we already did for thebuild-tiles.ymlstep, but even more do I wonder why you sync wheels via artifacts in the first place), download the valhalla wheel & the graph from GH artifacts via a brittletiles_run_id, then push the route responses before - diffing them in another job
everything before 5. is not an ergonomical workflow and very very humany error prone.
I see it this way: the routing-regression.yml is the only thing we ever need to manually run for a route regression test. everything else derives from this one source of truth, which orchestrates everything else. build-valhalla.yml is run exactly once per SHA, build-tiles.yml is run inline for each SHA in the route regression workflow.
| pip wheel . --no-build-isolation --wheel-dir /tmp/valhalla-dist \ | ||
| -Ccmake.build-type=Release \ | ||
| -Ccmake.define.ENABLE_PYTHON_BINDINGS=ON \ | ||
| -Ccmake.define.ENABLE_TESTS=OFF \ | ||
| -Ccmake.define.ENABLE_SERVICES=OFF |
There was a problem hiding this comment.
I don't think this'll currently work well for caching. ccache uses a lot of heuristics to invalidate cache hits and I'm quite sure compilation commands might use e.g. -I </tmp/pip-build-xxx> absolute paths which would trigger invalidation. that is bcs pip wheel uses /tmp to build the wheel.
this is just an educated guess. can you make sure that doesn't happen currently by simply executing this command twice on your local machine (of course with ccache installed)? you'll need to watch ccache hits/stats before & after the second run.
There was a problem hiding this comment.
and what happened to #12 (comment)? the way you use it now would need ENABLE_TOOLS=ON but the others are still not necessary.
| - name: Install Python build dependencies | ||
| run: pip install scikit-build-core pyproject-metadata setuptools-scm pybind11 | ||
|
|
There was a problem hiding this comment.
this is pretty bad for maintenance: whenever we add a build dependency, we now have to update two places! remove --build-isolation and this step
| valhalla_build_admins -c valhalla.json data/liechtenstein_graph.osm.pbf | ||
| valhalla_build_config \ | ||
| --mjolnir-tile-dir valhalla_tiles > valhalla.json | ||
| valhalla_build_tiles -c valhalla.json data/liechtenstein_graph.osm.pbf |
There was a problem hiding this comment.
please keep in mind to always verify locally before you actually commit, or at least before you ask for review! it's part of the learning process.
| - name: Install system dependencies | ||
| run: bash valhalla-src/scripts/install-linux-deps.sh |
There was a problem hiding this comment.
first install, then restore cache
| - name: Restore ccache | ||
| uses: tespkg/actions-cache/restore@v1 | ||
| with: | ||
| endpoint: ${{ secrets.HETZNER_S3_ENDPOINT }} |
There was a problem hiding this comment.
where did you get this secret from?
| - name: Install Valhalla from wheel | ||
| run: pip install /tmp/valhalla-dist/*.whl | ||
|
|
There was a problem hiding this comment.
we do want to control the python version this runs with.
| - name: Download tiles from build-tiles run | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| gh run download ${{ inputs.tiles_run_id }} \ | ||
| --repo "${{ github.repository }}" \ | ||
| --pattern "valhalla-tiles-*" \ | ||
| --dir valhalla_tiles | ||
| shopt -s dotglob | ||
| mv valhalla_tiles/valhalla-tiles-*/* valhalla_tiles/ 2>/dev/null || true |
There was a problem hiding this comment.
it's not priority, but worth mentioning: one thing we said we want to look out for is future compatibility with e.g. scenario old graph/new graph. currently it's fixed to using a single graph, whatever tiles_run_id references. could do 2 of those, but then I'd need to look up 4 things: old/new SHA, old/new run_id. all separately. this is super duper error prone and can really really take time to realize, when smth nasty seems to happen on the diffs we kick off. avoid at all costs!
don't try hard to keep that requirement in mind for the next round of edits. we can deal with more scenarios once we get there. I just needed to mention it.
There was a problem hiding this comment.
also don't sync via artifacts. let the graphs be uploaded to S3, that's a much more idiomatic place for them, especially the master graph. PR graphs should be uploaded to S3 as well, they'll be cleaned once a PR closes.
| valhalla_build_config \ | ||
| --mjolnir-tile-dir "$(pwd)/valhalla_tiles" \ | ||
| > /tmp/valhalla.json |
There was a problem hiding this comment.
pass the dir (after your next edit: the tar of the dir) to run_routes.py instead. pyvalhalla can deal with simply the path, no need for an external config!
| - name: Upload responses artifact | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: responses-${{ matrix.router.name }} | ||
| path: /tmp/responses-${{ matrix.router.name }}.jsonl | ||
| retention-days: 7 |
There was a problem hiding this comment.
this is literally the only thing we want uploaded to GH artifacts.
| tiles_run_id: | ||
| description: "Run ID from a successful build-tiles.yml run" | ||
| required: true |
There was a problem hiding this comment.
this is pretty brittle and awkward to hunt down.
Adds
.github/workflows/routing_regression.yml- skeleton for the routing regression pipeline .closes #10
Used AI as a drafting/thinking aid for implementation and test design. All changes were reviewed, tested, and understood before submission.-