Skip to content

chore: Replace python parquet generation script with ts#1876

Open
ZhongpinWang wants to merge 8 commits into
mainfrom
refactor-generate-parquet-in-ts
Open

chore: Replace python parquet generation script with ts#1876
ZhongpinWang wants to merge 8 commits into
mainfrom
refactor-generate-parquet-in-ts

Conversation

@ZhongpinWang

@ZhongpinWang ZhongpinWang commented May 20, 2026

Copy link
Copy Markdown
Contributor

Update:

Since it is at the moment not possible to test if the generated parquet works with the context registry service, I would pause this PR for a moment.


As in the title.

Also added a flag to the generation script to leave out items with [PREDICT] placeholder. This is a preparation step for the upcoming context registry feature support.

@hyperspace-insights

Copy link
Copy Markdown
Contributor

Summary

The following content is AI-generated and provides a summary of the pull request:


Replace Python Parquet Generation Script with TypeScript

Chore

♻️ Replaces the existing Python script for generating Parquet files with TypeScript equivalents, aligning the tooling with the rest of the project's TypeScript-based stack.

Changes

  • .gitignore: Added generated directory to the ignore list to exclude locally generated Parquet files from version control.
  • package.json: Added @dsnp/parquetjs as a dev dependency to enable Parquet file generation in TypeScript.
  • pnpm-lock.yaml: Updated lock file with @dsnp/parquetjs and its transitive dependencies (including AWS SDK, Smithy, Thrift, and other supporting libraries).
  • scripts/generate-parquet.ts: New TypeScript script that generates a sample payments.parquet file with customer payment data, saved to a local generated/ directory.
  • sample-code/resources/generate-parquet.ts: New TypeScript script that generates product data Parquet files for use with SAP RPT, replacing the removed Python script. Supports a --no-predict flag to optionally exclude rows with [PREDICT] placeholders.
  • sample-code/resources/generate_parquet.py: Removed — replaced by the TypeScript script above.
  • sample-code/resources/product_data_with_placeholders.parquet: New Parquet file containing product data rows including [PREDICT] placeholder entries, generated by the new script.
  • sample-code/resources/product_data.parquet: Updated Parquet file (regenerated via the new TypeScript script).
  • sample-code/src/rpt.ts: Updated the Parquet file reference from product_data.parquet to product_data_with_placeholders.parquet to use the new file that includes prediction placeholder rows.

  • 🔄 Regenerate and Update Summary
  • ✏️ Insert as PR Description (deletes this comment)
  • 🗑️ Delete comment
PR Bot Information

Version: 1.20.51

  • Event Trigger: pull_request.opened
  • LLM: anthropic--claude-4.6-sonnet
  • File Content Strategy: Full file content
  • Output Template: Default Template
  • Summary Prompt: Default Prompt
  • Correlation ID: 28e0c725-1921-43d4-a694-e43ff32494ac

Comment thread package.json Outdated
"typescript": "^6.0.3",
"zod": "^4.4.3"
"zod": "^4.4.3",
"@dsnp/parquetjs": "^1.8.7"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pp] I would prefer something lighter (dependency-tree wise), e.g. hyparquet-writer, parquet-wasm or @duckdb/duckdb-wasm.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked this one only because it is quite popular. Also different parquet library has different implementation of the protocol. For now I can't really test if the generated parquet works with our services (as they might expect parquet file exported from HANA). We can hold this PR for a moment and see if the package in the end works.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nth] Consider enabling compression. This file is quite a bit larger despite similar contents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked Issue or PR blocked due to other issues.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants