Core: update manifest delete file size after rewrite table action by mbutrovich · Pull Request #15470 · apache/iceberg

mbutrovich · 2026-02-27T17:12:50Z

Which issue does this PR close?

Rationale for this change

rewriteTablePath rewrites position delete files (updating embedded data file paths), which changes their size. The manifest was written before the delete files were rewritten, so file_size_in_bytes in the manifest reflected the original size. Readers that trust this field (Trino, Impala, Comet, iceberg-rust) fail with errors like "end of stream not reached."

What changes are included in this PR?

Moves position delete file rewriting into the manifest-writing Spark task, eliminating the separate rewritePositionDeletes() Spark job. Each manifest-writing task now also rewrites the delete files that manifest references, measures the actual size via writer.length() (or getLength() on the staged file when another task got there first), and records it in the manifest entry.

This means:

No extra Spark job — down from two jobs to one
Parallelism preserved — one Spark task per manifest, same as before
Correct file_size_in_bytes at manifest write time, no reconciliation needed

Trade-off: if the same delete file appears in multiple manifests, it gets rewritten redundantly. The staging path is deterministic so the output is identical — wasted I/O, not incorrect. In practice this is rare.

How are these changes tested?

New test testDeleteFileSizeInBytesAfterRewrite creates a table with position deletes using a deeply nested path (so the rewritten path differs in length), runs rewriteTablePath, copies the result, and asserts that file_size_in_bytes in the rewritten manifest matches the actual file size on disk.

AI Usage

I am more familiar with the iceberg-rust codebase, so Claude helped me navigate the code, prototype a design, and draft the PR description (in DataFusion Comet's PR template). Claude also helped me with the API change failures in CI.

anuragmantri · 2026-02-27T21:40:55Z

+  "1.10.0":
+    org.apache.iceberg:iceberg-api:
+    - code: "java.class.defaultSerializationChanged"
+      old: "class org.apache.iceberg.encryption.EncryptingFileIO"
+      new: "class org.apache.iceberg.encryption.EncryptingFileIO"
+      justification: "New method for Manifest List reading"
+    org.apache.iceberg:iceberg-core:
+    - code: "java.class.noLongerInheritsFromClass"


Was this moved unintentionally?

It's just what the ./gradlew revapi generated for me. As best as I can tell, this is what it did:

The existing 1.10.0 block moved earlier in the file (alphabetical sort?)

YAML indent changed from - code: (4-space) to - code: (2-space)

Our 4 new entries added (java.element.noLongerDeprecated, two java.method.numberOfParametersChanged, one java.method.removed for rewriteDeleteManifest)

Could you keep only the four new entries and revert the reordering/re-indentation of the existing 1.10.0 block? Manually adding the new entries into the existing block (keeping the original 4-space indent and ordering) would make the diff much smaller and easier to review.

I just want to mention that we must avoid breaking the API here. Please follow https://iceberg.apache.org/contribute/#minor-version-deprecations-required to avoid breaking the API

steveloughran · 2026-04-10T09:31:32Z

hmmm. In #15586 I've been working on skipping the HEAD call within s3afs and azure abfs client openFile() calls, and had been thinking that the manifest data would be best as it'd save another 100 mS/file on what is often a critical path. We need accurate manifests for that.

steveloughran · 2026-04-10T09:33:38Z

+  // manifest was not updated. Readers that use file_size_in_bytes to elide a stat() call may
+  // fail.
+  @TestTemplate
+  public void testDeleteFileSizeInBytesAfterRewrite() throws Exception {


are there equivalents of this for the other operations which add files to the manifests? if so they'd be handy being added for regression testing there too -though not in this PR

steveloughran

reviewed again. I think changes within RewriteTablePathUtil can eliminate the need for a HEAD call with a v3 puffin file.

steveloughran · 2026-04-14T12:16:40Z

+        String stagingPath = stagingPath(file.location(), sourcePrefix, stagingLocation);
+        OutputFile outputFile = io.newOutputFile(stagingPath);
+        try {
+          rewritePositionDeleteFile(


If this could actually return the length of the file written (which would need changes in rewriteDVFile() to do the same then there'd be no need for the HEAD call when rewriting puffin files.

it'd probably be needed in v2 rewrites, but here the getLength call should be pushed into rewritePositionDeleteFile() which would only invoke it on the v2 codepath.

…ask, eliminating the separate `rewritePositionDeletes()` Spark job. Each manifest-writing task now also rewrites the delete files that manifest references, measures the actual size via `getLength()`, and records it in the manifest entry.

…ustification "rewriteDeleteManifest signature changed to accept PositionDeleteReaderWriter for inline position delete rewriting; rewritePositionDeleteFile now returns file size to avoid getLength HEAD call".

steveloughran

I like the change.

wypoon · 2026-04-15T21:20:56Z

@mbutrovich as you're changing the signature of RewriteTablePathUtil.rewriteDeleteManifest (in core), and it is used by Spark, Flink, and Delta conversion, you need to update all the callers. Unfortunately, this means updating all supported Spark and Flink versions. Normally we make changes to the latest supported Spark or Flink version first when making a Spark or Flink change, and then backport to earlier versions in separate PRs, but in this case, it needs to be done all together.

mbutrovich · 2026-04-15T21:36:37Z

@mbutrovich as you're changing the signature of RewriteTablePathUtil.rewriteDeleteManifest (in core), and it is used by Spark, Flink, and Delta conversion, you need to update all the callers. Unfortunately, this means updating all supported Spark and Flink versions. Normally we make changes to the latest supported Spark or Flink version first when making a Spark or Flink change, and then backport to earlier versions in separate PRs, but in this case, it needs to be done all together.

Sounds good, thanks @wypoon! @anuragmantri suggested I do the change in one version first to make it easier for reviewers (reduce the diff), get consensus on the approach, and then once folks are good with it I can generate the changes for the other versions. I'm happy to do that at any time.

wypoon · 2026-04-15T21:57:11Z

@mbutrovich it is true that having changes for multiple versions of Spark and Flink clutters up a PR and makes reviewing more difficult. However, you can at least put up the changes for one version of Flink and for Delta conversion, and then later for all the versions.
I haven't made changes to the GitHub actions before, but I wonder if you changed, e.g., https://github.com/apache/iceberg/blob/main/.github/workflows/spark-ci.yml#L80 to run only for 4.1 in the PR, if it affects the CI for the PR. You can try it (changing the workflows as a temporary measure, to be reverted in a final version) and see if the CI succeeds.

sreejasahithi · 2026-05-11T10:47:02Z

@mbutrovich
Since this fixes a correctness issue with file_size_in_bytes, will this also be backported to the 1.10.x branch since this branch also has the issue?

mbutrovich · 2026-05-11T13:29:22Z

@mbutrovich Since this fixes a correctness issue with file_size_in_bytes, will this also be backported to the 1.10.x branch since this branch also has the issue?

This is my first upstream contribution to Iceberg Java, so happy to do so if the community requires. Is there a process to propose that besides the mailing list?

sreejasahithi · 2026-05-11T16:00:29Z

@mbutrovich Since this fixes a correctness issue with file_size_in_bytes, will this also be backported to the 1.10.x branch since this branch also has the issue?

cc @steveloughran @anuragmantri

…nifest tasks to avoid FileAlreadyExistsException on the shared staging path.

mbutrovich · 2026-05-12T17:37:06Z

CC @huaxingao @szehon-ho

anuragmantri · 2026-05-12T18:15:46Z

I took another look. The PR looks good to me.

Is there a process to propose that besides the mailing list?

The process involves merging the PR into the main branch and then creating a PR against the 1.10.x branch. I don’t believe the 1.10.2 maintenance release has been released yet. If we can make it before that, we could include this one in the release. You can also reply to the thread if you’d like to @mbutrovich

https://lists.apache.org/thread/5442q5bddyoss9mw27r0zbnobysfpn0t

huaxingao · 2026-05-12T21:54:19Z

+      // Another task in this Spark job already staged this file. Rewriting is deterministic, so
+      // its content (and therefore length) match what this task would have produced.
+      if (e.getCause() instanceof FileAlreadyExistsException) {
+        return io.newInputFile(stagingPath).getLength();


@steveloughran flagged the HDFS in-progress-write issue earlier:

There's actually something bad with hdfs here where the length of in-progress-writes can underreport length...it's only after close() that everything syncs up.

I think the concern applies here too. If Task A is still writing when Task B reaches getLength(), B reads an in‑progress length on HDFS, then you will get the same class of bug this PR is fixing.

github-actions Bot added spark core labels Feb 27, 2026

mbutrovich changed the title ~~Core: update delete file size after rewrite table action~~ Core: update manifest delete file size after rewrite table action Feb 27, 2026

anuragmantri reviewed Feb 27, 2026

View reviewed changes

mbutrovich force-pushed the delete_file_size_after_rewrite branch from 277c0e2 to 8922c70 Compare March 10, 2026 21:32

mbutrovich requested a review from anuragmantri March 12, 2026 15:10

anuragmantri reviewed Mar 13, 2026

View reviewed changes

Comment thread .palantir/revapi.yml Outdated

Comment thread core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java Outdated

steveloughran reviewed Apr 10, 2026

View reviewed changes

steveloughran reviewed Apr 14, 2026

View reviewed changes

mbutrovich added 9 commits April 15, 2026 13:51

Fix API check?

2852745

Fix API check?

4b0bddd

Address PR feedback 2 and 4.

ceb65bc

Try to reduce revapi diff.

f98a692

Try to reduce revapi diff.

d0f9944

rebased on upstream main

517c018

Address PR feedback.

669e4f4

Update revapi. Used ./gradlew :iceberg-core:revapiAcceptAllBreaks --j…

78107a1

…ustification "rewriteDeleteManifest signature changed to accept PositionDeleteReaderWriter for inline position delete rewriting; rewritePositionDeleteFile now returns file size to avoid getLength HEAD call".

mbutrovich force-pushed the delete_file_size_after_rewrite branch from 58725c3 to 78107a1 Compare April 15, 2026 18:11

mbutrovich requested review from anuragmantri and steveloughran April 15, 2026 18:13

steveloughran reviewed Apr 15, 2026

View reviewed changes

steveloughran mentioned this pull request Apr 16, 2026

Core: Stream DV Puffin rewrite in RewriteTablePathUtil #15927

Open

mbutrovich added 3 commits April 24, 2026 15:38

Create Spark 3.4, 3.5, 4.0 versions.

57b3c43

Merge branch 'main' into delete_file_size_after_rewrite

5f4ac53

Fix after upmerge with main.

e66e0b7

mbutrovich added 2 commits May 12, 2026 09:53

Dedupe concurrent rewrites of the same position delete file across ma…

2b54f86

…nifest tasks to avoid FileAlreadyExistsException on the shared staging path.

Merge branch 'main' into delete_file_size_after_rewrite

12db8c0

steveloughran mentioned this pull request May 12, 2026

Spark: Pass path/length to underlying FileIO in SerializableFileIOWithSize #16284

Merged

anuragmantri approved these changes May 12, 2026

View reviewed changes

huaxingao reviewed May 12, 2026

View reviewed changes

Conversation

mbutrovich commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

AI Usage

Uh oh!

anuragmantri Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mbutrovich Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

huaxingao May 12, 2026

Choose a reason for hiding this comment

Uh oh!

nastra May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steveloughran commented Apr 10, 2026

Uh oh!

steveloughran Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

wypoon commented Apr 15, 2026

Uh oh!

mbutrovich commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wypoon commented Apr 15, 2026

Uh oh!

sreejasahithi commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbutrovich commented May 11, 2026

Uh oh!

sreejasahithi commented May 11, 2026

Uh oh!

mbutrovich commented May 12, 2026

Uh oh!

anuragmantri commented May 12, 2026

Uh oh!

huaxingao May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mbutrovich commented Feb 27, 2026 •

edited

Loading

mbutrovich commented Apr 15, 2026 •

edited

Loading

sreejasahithi commented May 11, 2026 •

edited

Loading