feat: support cleanup explain for python and java#7248
Conversation
Xuanwo
left a comment
There was a problem hiding this comment.
I found two issues in the Java cleanup explain binding that should be fixed before this lands.
First, the JNI conversion should release local references while building cleanup explanation lists. cleanup_candidate_files_to_java can convert up to the Rust default of 1000 candidate files, and each entry creates local refs for the path string, kind string, and CleanupCandidateFile. Since the loop does not use a local frame or delete those temporary refs, dataset.cleanup(policy).explain() can overflow the JNI local reference table on larger explanations. The same pattern should be handled for referenced branches and warnings.
Second, Java should expose the candidate file limit that Rust core and Python already expose. Rust has with_max_candidate_files, and Python exposes this as max_files, but Java currently only has cleanup(policy).explain() and the JNI path calls .explain() directly. That fixes Java at the default 1000 candidate files, so users cannot request more details when candidateFilesTruncated is true, and cannot lower the amount of detail returned. A Java surface like cleanup(policy).withMaxCandidateFiles(n).explain() or explain(n) should pass the value through JNI to Rust with_max_candidate_files(n).
Summary
Add Java and PyLance APIs for cleanup explain.
Dataset.explain_cleanup_old_versions(...)with cleanup explanationresult types
lancepackagedataset.cleanup(policy).explain()/.execute()operation APIcleanupWithPolicy(policy)as a compatibility wrapper for executeTesting
cargo fmt --allcargo fmt --manifest-path ./java/lance-jni/Cargo.toml --allcargo check --manifest-path python/Cargo.tomlcargo check --manifest-path ./java/lance-jni/Cargo.toml./mvnw compile./mvnw -Dtest=CleanupTest testuv run --frozen --python 3.12 pytest python/tests/ test_dataset.py::test_explain_cleanup_old_versions