fix: make FTS finalization idempotent#7272
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
40e2b13 to
4d1f58f
Compare
4d1f58f to
22065f8
Compare
22065f8 to
8c9cc55
Compare
8c9cc55 to
d9d2270
Compare
Xuanwo
left a comment
There was a problem hiding this comment.
Two comments on FTS finalization.
| for suffix in PARTITION_FILE_SUFFIXES { | ||
| let staged_path = staged_partition_file_path(old_id, suffix); | ||
| let final_path = partition_file_path(new_id, suffix); | ||
| let _ = store.delete_index_file(&final_path).await; |
There was a problem hiding this comment.
This is still unsafe with concurrent finalizers. metadata.lance is only checked at entry, so another finalizer may commit it before this one reaches delete_index_file(final_path). If this task then crashes after the delete, retries will no-op on the committed metadata while a final part_* file is missing.
Can we avoid deleting canonical final files, or re-check the commit marker before any destructive final-path operation?
There was a problem hiding this comment.
Fixed in e9ccb116a by removing the pre-copy delete of canonical final part_<id>_* files. I agree full atomic multi-object finalize is not possible with the current object-store abstraction, so the guarantee here is idempotence: concurrent/retry finalizers copy deterministic staged bytes to the same final names, metadata.lance remains the commit marker, and post-commit cleanup only touches staging/.
I also extended test_merge_index_files_rewrites_partial_final_files_from_staging so it still proves partial final files are repaired from staging, and now asserts merge does not delete root part_* files first.
@Xuanwo could you take another look?
| progress.stage_complete("write_merged_metadata").await?; | ||
|
|
||
| // Cleanup partition metadata files | ||
| // Cleanup staged partition metadata files |
There was a problem hiding this comment.
Ignoring staging cleanup errors can commit staging/... into IndexMetadata.files. The commit path finalizes, then lists the whole index dir, and that listing does not filter staging files.
Can we filter staging/, return the canonical final file list from finalize, or fail the commit when cleanup fails?
There was a problem hiding this comment.
Addressed by filtering staging/ from FTS segment IndexMetadata.files after finalization. Cleanup remains best-effort, but stale staged objects are treated as transient and are not recorded in committed metadata. I also extended test_commit_existing_index_segments_finalizes_fts_segments to create a leftover staging/orphan.lance after final metadata exists and assert the committed file list excludes it.
d9d2270 to
92f4484
Compare
92f4484 to
e9ccb11
Compare
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you working for this!
Summary
Make distributed FTS finalization idempotent and object-store friendly. The old finalize flow was not retry-safe because it moved build-only
part_<old_id>_*partition files before writing the finalmetadata.lance; if the process stopped in that window, a retry saw no final metadata but the original source files had already been deleted.The new flow treats
metadata.lanceas the commit marker and keeps all pre-commit partition sources under stablestaging/paths. Finalization copies staged files to dense final names without deleting the staged sources or canonical final paths first, writes final metadata, then removes staging best-effort. Any leftoverstaging/objects are filtered from committed FTSIndexMetadata.files, so cleanup failures do not make transient files part of the committed index.Implementation
staging/part_<old_id>_*paths.part_<id>_*files and finalmetadata.lancedirectly, with no dense rewrite.IndexStore::copy_index_file_to(source, dest, dest_store)for source-preserving copy-to-new-name;LanceIndexStoreimplements it with object-store copy and supports nested relative paths likestaging/....from_existing_index(..., fragment_mask=Some(...)), before advertising those partitions in staged metadata.merge_index_files, discover staged metadata, sort original partition IDs, map them densely to0..N, copystaging/part_<old_id>_*to finalpart_<new_id>_*without pre-deleting final paths, and then write finalmetadata.lance.staging/files from FTS segmentIndexMetadata.filesduring segment commit.metadata.lancemakes finalize a no-op; without it, staged sources remain authoritative and any partial final files are replaceable from staging.Added regressions for staged distributed writes, distributed-from-existing finalize, partial final-file retry without root
part_*deletion, preserving staging when final metadata writing fails, and excluding stale staging files from committed metadata. Validated locally with Rust fmt/tests/clippy, segmented FTS integration tests, Python extension build, the targeted Python progress test, anduv run make lint-rust.