Skip to content

feat: support cleanup explain for python and java#7248

Open
yanghua wants to merge 3 commits into
lance-format:mainfrom
yanghua:feat-cleanup-explain-py-java
Open

feat: support cleanup explain for python and java#7248
yanghua wants to merge 3 commits into
lance-format:mainfrom
yanghua:feat-cleanup-explain-py-java

Conversation

@yanghua

@yanghua yanghua commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add Java and PyLance APIs for cleanup explain.

  • Add PyLance Dataset.explain_cleanup_old_versions(...) with cleanup explanation
    result types
  • Export PyLance cleanup explanation types from the top-level lance package
  • Add Java dataset.cleanup(policy).explain() / .execute() operation API
  • Keep Java cleanupWithPolicy(policy) as a compatibility wrapper for execute
  • Add JNI conversions and tests for cleanup explanations

Testing

  • cargo fmt --all
  • cargo fmt --manifest-path ./java/lance-jni/Cargo.toml --all
  • cargo check --manifest-path python/Cargo.toml
  • cargo check --manifest-path ./java/lance-jni/Cargo.toml
  • ./mvnw compile
  • ./mvnw -Dtest=CleanupTest test
  • uv run --frozen --python 3.12 pytest python/tests/ test_dataset.py::test_explain_cleanup_old_versions

@github-actions github-actions Bot added A-python Python bindings A-java Java bindings + JNI enhancement New feature or request labels Jun 12, 2026
@yanghua yanghua marked this pull request as ready for review June 13, 2026 02:13

@Xuanwo Xuanwo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two issues in the Java cleanup explain binding that should be fixed before this lands.

First, the JNI conversion should release local references while building cleanup explanation lists. cleanup_candidate_files_to_java can convert up to the Rust default of 1000 candidate files, and each entry creates local refs for the path string, kind string, and CleanupCandidateFile. Since the loop does not use a local frame or delete those temporary refs, dataset.cleanup(policy).explain() can overflow the JNI local reference table on larger explanations. The same pattern should be handled for referenced branches and warnings.

Second, Java should expose the candidate file limit that Rust core and Python already expose. Rust has with_max_candidate_files, and Python exposes this as max_files, but Java currently only has cleanup(policy).explain() and the JNI path calls .explain() directly. That fixes Java at the default 1000 candidate files, so users cannot request more details when candidateFilesTruncated is true, and cannot lower the amount of detail returned. A Java surface like cleanup(policy).withMaxCandidateFiles(n).explain() or explain(n) should pass the value through JNI to Rust with_max_candidate_files(n).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-java Java bindings + JNI A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants