disable wrong parallel decompression for LOAD DATA by robll-v1 · Pull Request #23620 · matrixorigin/matrixone

robll-v1 · 2026-01-29T07:40:02Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Summary

Parallel LOAD DATA on compressed files is unsafe because the current implementation splits input by byte offsets and then tries to locate line boundaries on a compressed stream. This can cut rows in the middle, cause column misalignment, and finally write corrupted values (e.g., invalid DATE/DECIMAL). The issue manifests later as panics during query output (e.g., Date.ToBytes out-of-range).

Root Cause

ReadFileOffset computes offsets using Offset = previous + batchSize and then getTailSize on the compressed reader, which is not a valid line-boundary strategy for compressed streams. This leads to row corruption.

Fix (medium-cost path)

Keep parallel write, but disable parallel splitting for compressed files.
For compressed data, load is handled by a single reader (proper decompression + parsing), and the insert stage still runs in parallel.
Avoid the “string vector + cast projection” path for compressed files to prevent misalignment.
Tests

Added a unit test that asserts ReadFileOffset rejects compressed input.
This locks the unsafe behavior and prevents regression.
Note: If we later refactor to truly support parallel splitting for compressed files (e.g., with a splittable format), this test must be updated or removed.
Behavior After Fix

Compressed LOAD DATA no longer panics.
Parallelism is preserved on the write side; read-side splitting is disabled for compressed inputs.

CLAassistant · 2026-01-29T07:40:13Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

fix select err

4d4badd

robll-v1 requested review from aunjgr and ouyuanning as code owners January 29, 2026 07:40

robll-v1 temporarily deployed to ci January 29, 2026 07:40 — with GitHub Actions Inactive

robll-v1 had a problem deploying to ci January 29, 2026 07:40 — with GitHub Actions Error

robll-v1 temporarily deployed to ci January 29, 2026 07:40 — with GitHub Actions Inactive

matrix-meow added the size/S Denotes a PR that changes [10,99] lines label Jan 29, 2026

Merge branch 'main' into fix/load

edddada

mergify bot added the kind/bug Something isn't working label Jan 29, 2026

mergify bot temporarily deployed to ci January 29, 2026 07:41 Inactive

mergify bot had a problem deploying to ci January 29, 2026 07:41 Failure

mergify bot temporarily deployed to ci January 29, 2026 07:41 Inactive

fix the err test

7c6c252

robll-v1 temporarily deployed to ci January 29, 2026 08:44 — with GitHub Actions Inactive

robll-v1 had a problem deploying to ci January 29, 2026 08:44 — with GitHub Actions Failure

robll-v1 temporarily deployed to ci January 29, 2026 08:44 — with GitHub Actions Inactive

mergify bot temporarily deployed to ci January 29, 2026 10:25 Inactive

mergify bot merged commit 1ac8002 into matrixorigin:main Jan 29, 2026
23 of 24 checks passed

mergify bot had a problem deploying to ci January 29, 2026 11:19 Failure

mergify bot temporarily deployed to ci January 29, 2026 11:19 Inactive

mergify bot removed the queued label Jan 29, 2026

mergify bot temporarily deployed to ci January 29, 2026 11:29 Inactive

mergify bot temporarily deployed to ci January 29, 2026 11:43 Inactive

mergify bot had a problem deploying to ci January 29, 2026 11:49 Failure

mergify bot temporarily deployed to ci January 29, 2026 12:15 Inactive

mergify bot temporarily deployed to ci January 29, 2026 12:40 Inactive

robll-v1 had a problem deploying to ci February 2, 2026 08:10 — with GitHub Actions Error

robll-v1 temporarily deployed to ci February 2, 2026 08:10 — with GitHub Actions Inactive

robll-v1 had a problem deploying to ci February 2, 2026 08:10 — with GitHub Actions Error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable wrong parallel decompression for LOAD DATA#23620

disable wrong parallel decompression for LOAD DATA#23620
mergify[bot] merged 5 commits intomatrixorigin:mainfrom
robll-v1:fix/load

robll-v1 commented Jan 29, 2026

Uh oh!

CLAassistant commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

robll-v1 commented Jan 29, 2026

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

CLAassistant commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants