Skip to content

feat: configure blob inline threshold per column#7269

Merged
Xuanwo merged 5 commits into
mainfrom
xuanwo/blob-v2-inline-threshold
Jun 15, 2026
Merged

feat: configure blob inline threshold per column#7269
Xuanwo merged 5 commits into
mainfrom
xuanwo/blob-v2-inline-threshold

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

This adds per-column blob v2 inline threshold metadata so callers can choose when a blob column moves from inline data-file storage to packed sidecar storage, without changing the existing packed sidecar rolling option.

The threshold is stored on the blob field metadata, matching the existing dedicated blob threshold model. Existing blob columns keep their policy in the dataset schema; appends that explicitly provide different threshold metadata are rejected instead of silently ignoring the input schema. The Python and Rust helpers validate threshold values at the API boundary so invalid values do not silently fall back to defaults.

Closes #7268.

@github-actions github-actions Bot added A-python Python bindings A-docs Documentation enhancement New feature or request labels Jun 14, 2026
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.81979% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write.rs 78.78% 11 Missing and 3 partials ⚠️
rust/lance/src/dataset/blob.rs 99.12% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@Xuanwo Xuanwo marked this pull request as ready for review June 15, 2026 09:03
Comment thread rust/lance/src/dataset/write.rs Outdated
continue;
};

if dataset_field.metadata.get(key) != Some(input_value) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This compares raw metadata, so appending a schema that explicitly sets the default threshold to an older/default dataset with no key will fail even though the effective threshold is unchanged. Please compare normalized threshold values.

Comment thread rust/lance/src/dataset/blob.rs Outdated
field
.metadata()
.get(BLOB_INLINE_SIZE_THRESHOLD_META_KEY)
.and_then(|value| value.parse::<usize>().ok())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid threshold metadata is silently ignored here. Since callers can set these keys directly on the schema, a bad value can be persisted while writes fall back to the default layout. Please reject invalid values on the write path.

Comment thread rust/lance/src/dataset/write.rs Outdated
let is_blob_v2_field = input_field
.metadata
.get(ARROW_EXT_NAME_KEY)
.or_else(|| dataset_field.metadata.get(ARROW_EXT_NAME_KEY))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only falls back to the dataset extension when the input extension key is absent. If the input field has a non-blob extension plus blob threshold metadata, append validation skips the threshold checks and silently drops the input metadata. Please treat either side being blob v2 as requiring validation.

Comment thread rust/lance/src/dataset/write.rs
@Xuanwo Xuanwo merged commit f405b34 into main Jun 15, 2026
32 checks passed
@Xuanwo Xuanwo deleted the xuanwo/blob-v2-inline-threshold branch June 15, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-docs Documentation A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make blob v2 inline threshold configurable per column

2 participants