Skip to content

Expose protobuf CLI training and compression tuning flags#830

Open
REASY wants to merge 1 commit into
facebook:devfrom
REASY:protobuf-cli-benchmark-knobs
Open

Expose protobuf CLI training and compression tuning flags#830
REASY wants to merge 1 commit into
facebook:devfrom
REASY:protobuf-cli-benchmark-knobs

Conversation

@REASY

@REASY REASY commented Jun 19, 2026

Copy link
Copy Markdown

Expose protobuf CLI training and compression tuning flags

Summary

This PR exposes a few existing protobuf CLI tuning knobs so protobuf_cli can be used for reproducible benchmarking and compression tuning:

  • serialize --compression-level <level>
  • benchmark --compression-level <level>
  • train --trainer greedy|bottom-up|full-split
  • train --threads <n>
  • train --max-time-secs <seconds>

The default behavior is unchanged when these flags are not provided.

Motivation

The protobuf CLI already supports training and serializing protobuf payloads, but several important tuning parameters were only reachable from code defaults. That makes it hard to compare OpenZL behavior across compression levels, trainer choices, thread counts, or bounded training budgets.

These flags are useful for:

  • benchmarking OpenZL against other compressors with controlled compression levels;
  • comparing OpenZL trainers on the same dataset;
  • making training faster or more predictable with --threads;
  • preventing long-running training jobs with --max-time-secs;
  • improving reproducibility of CLI-based experiments.

Changes

Compression level

Adds ProtoSerializer::setCompressionLevel(int level), which forwards to:

cctx_.setParameter(CParam::CompressionLevel, level);

The protobuf CLI now applies this setting for:

protobuf_cli serialize ... --compression-level 9
protobuf_cli benchmark ... --compression-level 9

Trainer selection

Adds CLI parsing for the existing training::ClusteringTrainer variants:

protobuf_cli train ... --trainer greedy
protobuf_cli train ... --trainer bottom-up
protobuf_cli train ... --trainer full-split

Training controls

Adds CLI flags that map to existing training::TrainParams fields:

protobuf_cli train ... --threads 8
protobuf_cli train ... --max-time-secs 120

Backward Compatibility

All new flags are optional.

Existing invocations continue to use the same defaults as before:

  • default serializer compression behavior is unchanged;
  • default trainer selection remains unchanged;
  • default training thread/time behavior remains unchanged.

No serialization format changes are introduced.

Example

protobuf_cli train \
  --descriptor schema.desc \
  --message-type my.package.Message \
  --input train \
  --output trained.zlc \
  --trainer greedy \
  --threads 8 \
  --max-time-secs 300

protobuf_cli serialize \
  --descriptor schema.desc \
  --message-type my.package.Message \
  --input message.binpb \
  --output message.zl \
  --compressor trained.zlc \
  --compression-level 9

Testing

Built protobuf_cli successfully with:

cmake \
  -S . \
  -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DOPENZL_BUILD_PROTOBUF_TOOLS=ON

cmake --build build --target protobuf_cli -j 8

@meta-cla

meta-cla Bot commented Jun 19, 2026

Copy link
Copy Markdown

Hi @REASY!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla

meta-cla Bot commented Jun 19, 2026

Copy link
Copy Markdown

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla Bot added the cla signed label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant