feat: configure blob inline threshold per column#7269
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| continue; | ||
| }; | ||
|
|
||
| if dataset_field.metadata.get(key) != Some(input_value) { |
There was a problem hiding this comment.
This compares raw metadata, so appending a schema that explicitly sets the default threshold to an older/default dataset with no key will fail even though the effective threshold is unchanged. Please compare normalized threshold values.
| field | ||
| .metadata() | ||
| .get(BLOB_INLINE_SIZE_THRESHOLD_META_KEY) | ||
| .and_then(|value| value.parse::<usize>().ok()) |
There was a problem hiding this comment.
Invalid threshold metadata is silently ignored here. Since callers can set these keys directly on the schema, a bad value can be persisted while writes fall back to the default layout. Please reject invalid values on the write path.
… into xuanwo/blob-v2-inline-threshold
| let is_blob_v2_field = input_field | ||
| .metadata | ||
| .get(ARROW_EXT_NAME_KEY) | ||
| .or_else(|| dataset_field.metadata.get(ARROW_EXT_NAME_KEY)) |
There was a problem hiding this comment.
This only falls back to the dataset extension when the input extension key is absent. If the input field has a non-blob extension plus blob threshold metadata, append validation skips the threshold checks and silently drops the input metadata. Please treat either side being blob v2 as requiring validation.
This adds per-column blob v2 inline threshold metadata so callers can choose when a blob column moves from inline data-file storage to packed sidecar storage, without changing the existing packed sidecar rolling option.
The threshold is stored on the blob field metadata, matching the existing dedicated blob threshold model. Existing blob columns keep their policy in the dataset schema; appends that explicitly provide different threshold metadata are rejected instead of silently ignoring the input schema. The Python and Rust helpers validate threshold values at the API boundary so invalid values do not silently fall back to defaults.
Closes #7268.